<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">Argumentation sentences 1.0</titl>
        <parTitl xml:lang="en">Argumentation sentences 1.0</parTitl>
        <IDNo agency="SND">doi-10-23695-56t6-rc52-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/56T6-RC52</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.23695/56T6-RC52">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">Argumentation sentences 1.0</titl>
        <parTitl xml:lang="en">Argumentation sentences 1.0</parTitl>
        <IDNo agency="SND">doi-10-23695-56t6-rc52-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/56T6-RC52</IDNo>
      </titlStmt>
      <rspStmt />
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-01-01" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-01-01" />
      </verStmt>
      <holdings URI="https://doi.org/10.23695/56T6-RC52">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">I. IDENTIFYING INFORMATION

Title*
Argumentation sentences

Subtitle
A translated corpus for classifying sentence stance in relation to a topic. 

Created by*
Anna Lindahl (anna.lindahl@svenska.gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/superlim

License(s)*
CC BY 4.0

Abstract*
Argumentation sentences is a translated corpus for the task of identifying stance in relation to a topic. It consists of sentences labeled with pro, con or non in relation to one of six topics.  The original dataset [1] can be found here https://github.com/trtm/AURC.  The test set is manually corrected translations, the training set is machine translated. 

Funded by*
Vinnova (grant no. 2021-04165) 

Cite as

Related datasets
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim)

II. USAGE

Key applications
Machine learning, argumentation mining, stance classification

Intended task(s)/usage(s)
Evaluate models on the following task: Given a sentence and a topic, determine if the sentence is for, against or neutral in relation to the topic.

Recommended evaluation measures
Krippendorff’s alpha (the official SuperLim measure), MCC, F

Dataset function(s)
Training, testing

Recommended split(s)
Train, dev, test (provided)

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
5265 sentences split over 6 topics, 3450 train, 750 dev and 1065 test

Nature of the content*
Topics: Abortion, Death penalty, Nuclear power, Marijuana legalization, Minimum wage, Cloning. Each topic has a set of associated sentences, lableled with pro, con or non in relation to the topic.

Format*
Jsonl with the following keys: sentence_id = the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Tab-separated with 4 columns: the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Data source(s)*
The original data comes from the AURC dataset [1] ( https://github.com/trtm/AURC). For this corpus, only the in-domain topics were used.

Data collection method(s)*
Collected from the Common Crawl archive. See [1]

Data selection and filtering*
A subset of the original data, only the in-domain topics are used.

Data preprocessing*
Sentences were machine translated. The test set was then manually corrected. 

Data labeling*
The sentences are labeled with pro, con or non, signifying their stance in relation to a topic.

Annotator characteristics

IV. ETHICS AND CAVEATS

Ethical considerations

Things to watch out for

V. ABOUT DOCUMENTATION

Data last updated*
20221215

Which changes have been made, compared to the previous version*
First version

Access to previous versions

This document created*
20221215 by Anna Lindahl

This document last updated*
20220203 by Anna Lindahl

Where to look for further details

Documentation template version*
v1.1

VI. OTHER

Related projects

References
[1] Trautmann, D., Daxenberger, J., Stab, C., Schütze, H., &amp; Gurevych, I. (2020, April). Fine-grained argument unit recognition and classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9048-9056).</abstract>
      <abstract xml:lang="sv" contentType="abstract">I. IDENTIFYING INFORMATION

Title*
Argumentation sentences

Subtitle
A translated corpus for classifying sentence stance in relation to a topic. 

Created by*
Anna Lindahl (anna.lindahl@svenska.gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/superlim

License(s)*
CC BY 4.0

Abstract*
Argumentation sentences is a translated corpus for the task of identifying stance in relation to a topic. It consists of sentences labeled with pro, con or non in relation to one of six topics.  The original dataset [1] can be found here https://github.com/trtm/AURC.  The test set is manually corrected translations, the training set is machine translated. 

Funded by*
Vinnova (grant no. 2021-04165) 

Cite as

Related datasets
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim)

II. USAGE

Key applications
Machine learning, argumentation mining, stance classification

Intended task(s)/usage(s)
Evaluate models on the following task: Given a sentence and a topic, determine if the sentence is for, against or neutral in relation to the topic.

Recommended evaluation measures
Krippendorff’s alpha (the official SuperLim measure), MCC, F

Dataset function(s)
Training, testing

Recommended split(s)
Train, dev, test (provided)

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
5265 sentences split over 6 topics, 3450 train, 750 dev and 1065 test

Nature of the content*
Topics: Abortion, Death penalty, Nuclear power, Marijuana legalization, Minimum wage, Cloning. Each topic has a set of associated sentences, lableled with pro, con or non in relation to the topic.

Format*
Jsonl with the following keys: sentence_id = the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Tab-separated with 4 columns: the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Data source(s)*
The original data comes from the AURC dataset [1] ( https://github.com/trtm/AURC). For this corpus, only the in-domain topics were used.

Data collection method(s)*
Collected from the Common Crawl archive. See [1]

Data selection and filtering*
A subset of the original data, only the in-domain topics are used.

Data preprocessing*
Sentences were machine translated. The test set was then manually corrected. 

Data labeling*
The sentences are labeled with pro, con or non, signifying their stance in relation to a topic.

Annotator characteristics

IV. ETHICS AND CAVEATS

Ethical considerations

Things to watch out for

V. ABOUT DOCUMENTATION

Data last updated*
20221215

Which changes have been made, compared to the previous version*
First version

Access to previous versions

This document created*
20221215 by Anna Lindahl

This document last updated*
20220203 by Anna Lindahl

Where to look for further details

Documentation template version*
v1.1

VI. OTHER

Related projects

References
[1] Trautmann, D., Daxenberger, J., Stab, C., Schütze, H., &amp; Gurevych, I. (2020, April). Fine-grained argument unit recognition and classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9048-9056).</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>