<ddi:DDIInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:instance:3_3 http://ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/instance.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ddi="ddi:instance:3_3" xmlns:r="ddi:reusable:3_3" xmlns:s="ddi:studyunit:3_3" xmlns:d="ddi:datacollection:3_3" xmlns:a="ddi:archive:3_3" xmlns:c="ddi:conceptualcomponent:3_3" xmlns:cm="ddi:comparative:3_3" xmlns:g="ddi:group:3_3" xmlns:l="ddi:logicalproduct:3_3" xmlns:p="ddi:physicaldataproduct:3_3" xmlns:pi="ddi:physicalinstance:3_3" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xml="http://www.w3.org/XML/1998/namespace" isMaintainable="true" scopeOfUniqueness="Agency">
  <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52:0</r:URN>
  <r:Agency>SND</r:Agency>
  <r:ID>doi-10-23695-56t6-rc52</r:ID>
  <r:Version>0</r:Version>
  <g:ResourcePackage>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.ResourcePackage:2.0</r:URN>
    <r:OtherMaterialScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.OtherMaterialScheme:2.0</r:URN>
    </r:OtherMaterialScheme>
    <a:OrganizationScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.OrganizationScheme-0:2.0</r:URN>
      <a:Individual>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.Individual-0:2.0</r:URN>
        <a:IndividualIdentification>
          <a:IndividualName>
            <a:FullName>
              <r:String>Lindahl, Anna</r:String>
            </a:FullName>
          </a:IndividualName>
        </a:IndividualIdentification>
      </a:Individual>
    </a:OrganizationScheme>
  </g:ResourcePackage>
  <s:StudyUnit>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.StudyUnit:2.0</r:URN>
    <r:UserID typeOfUserID="datasetIdentifier">doi-10-23695-56t6-rc52</r:UserID>
    <r:Citation>
      <r:Title>
        <r:String xml:lang="sv">Argumentation sentences 1.0</r:String>
        <r:String xml:lang="en">Argumentation sentences 1.0</r:String>
      </r:Title>
      <r:Creator>
        <r:CreatorReference>
          <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.Individual-0:2.0</r:URN>
          <r:TypeOfObject>Individual</r:TypeOfObject>
        </r:CreatorReference>
      </r:Creator>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:PublicationDate>
        <r:SimpleDate>2024-01-01</r:SimpleDate>
      </r:PublicationDate>
      <r:InternationalIdentifier>
        <r:IdentifierContent>10.23695/56T6-RC52</r:IdentifierContent>
        <r:ManagingAgency controlledVocabularyAgencyName="DOI">DOI</r:ManagingAgency>
      </r:InternationalIdentifier>
    </r:Citation>
    <r:Abstract>
      <r:Content xml:lang="sv">I. IDENTIFYING INFORMATION

Title*
Argumentation sentences

Subtitle
A translated corpus for classifying sentence stance in relation to a topic. 

Created by*
Anna Lindahl (anna.lindahl@svenska.gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/superlim

License(s)*
CC BY 4.0

Abstract*
Argumentation sentences is a translated corpus for the task of identifying stance in relation to a topic. It consists of sentences labeled with pro, con or non in relation to one of six topics.  The original dataset [1] can be found here https://github.com/trtm/AURC.  The test set is manually corrected translations, the training set is machine translated. 

Funded by*
Vinnova (grant no. 2021-04165) 

Cite as

Related datasets
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim)

II. USAGE

Key applications
Machine learning, argumentation mining, stance classification

Intended task(s)/usage(s)
Evaluate models on the following task: Given a sentence and a topic, determine if the sentence is for, against or neutral in relation to the topic.

Recommended evaluation measures
Krippendorff’s alpha (the official SuperLim measure), MCC, F

Dataset function(s)
Training, testing

Recommended split(s)
Train, dev, test (provided)

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
5265 sentences split over 6 topics, 3450 train, 750 dev and 1065 test

Nature of the content*
Topics: Abortion, Death penalty, Nuclear power, Marijuana legalization, Minimum wage, Cloning. Each topic has a set of associated sentences, lableled with pro, con or non in relation to the topic.

Format*
Jsonl with the following keys: sentence_id = the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Tab-separated with 4 columns: the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Data source(s)*
The original data comes from the AURC dataset [1] ( https://github.com/trtm/AURC). For this corpus, only the in-domain topics were used.

Data collection method(s)*
Collected from the Common Crawl archive. See [1]

Data selection and filtering*
A subset of the original data, only the in-domain topics are used.

Data preprocessing*
Sentences were machine translated. The test set was then manually corrected. 

Data labeling*
The sentences are labeled with pro, con or non, signifying their stance in relation to a topic.

Annotator characteristics

IV. ETHICS AND CAVEATS

Ethical considerations

Things to watch out for

V. ABOUT DOCUMENTATION

Data last updated*
20221215

Which changes have been made, compared to the previous version*
First version

Access to previous versions

This document created*
20221215 by Anna Lindahl

This document last updated*
20220203 by Anna Lindahl

Where to look for further details

Documentation template version*
v1.1

VI. OTHER

Related projects

References
[1] Trautmann, D., Daxenberger, J., Stab, C., Schütze, H., &amp; Gurevych, I. (2020, April). Fine-grained argument unit recognition and classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9048-9056).</r:Content>
      <r:Content xml:lang="en">I. IDENTIFYING INFORMATION

Title*
Argumentation sentences

Subtitle
A translated corpus for classifying sentence stance in relation to a topic. 

Created by*
Anna Lindahl (anna.lindahl@svenska.gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/superlim

License(s)*
CC BY 4.0

Abstract*
Argumentation sentences is a translated corpus for the task of identifying stance in relation to a topic. It consists of sentences labeled with pro, con or non in relation to one of six topics.  The original dataset [1] can be found here https://github.com/trtm/AURC.  The test set is manually corrected translations, the training set is machine translated. 

Funded by*
Vinnova (grant no. 2021-04165) 

Cite as

Related datasets
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim)

II. USAGE

Key applications
Machine learning, argumentation mining, stance classification

Intended task(s)/usage(s)
Evaluate models on the following task: Given a sentence and a topic, determine if the sentence is for, against or neutral in relation to the topic.

Recommended evaluation measures
Krippendorff’s alpha (the official SuperLim measure), MCC, F

Dataset function(s)
Training, testing

Recommended split(s)
Train, dev, test (provided)

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
5265 sentences split over 6 topics, 3450 train, 750 dev and 1065 test

Nature of the content*
Topics: Abortion, Death penalty, Nuclear power, Marijuana legalization, Minimum wage, Cloning. Each topic has a set of associated sentences, lableled with pro, con or non in relation to the topic.

Format*
Jsonl with the following keys: sentence_id = the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Tab-separated with 4 columns: the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself

Data source(s)*
The original data comes from the AURC dataset [1] ( https://github.com/trtm/AURC). For this corpus, only the in-domain topics were used.

Data collection method(s)*
Collected from the Common Crawl archive. See [1]

Data selection and filtering*
A subset of the original data, only the in-domain topics are used.

Data preprocessing*
Sentences were machine translated. The test set was then manually corrected. 

Data labeling*
The sentences are labeled with pro, con or non, signifying their stance in relation to a topic.

Annotator characteristics

IV. ETHICS AND CAVEATS

Ethical considerations

Things to watch out for

V. ABOUT DOCUMENTATION

Data last updated*
20221215

Which changes have been made, compared to the previous version*
First version

Access to previous versions

This document created*
20221215 by Anna Lindahl

This document last updated*
20220203 by Anna Lindahl

Where to look for further details

Documentation template version*
v1.1

VI. OTHER

Related projects

References
[1] Trautmann, D., Daxenberger, J., Stab, C., Schütze, H., &amp; Gurevych, I. (2020, April). Fine-grained argument unit recognition and classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9048-9056).</r:Content>
    </r:Abstract>
    <r:Coverage>
      <r:TopicalCoverage>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.TopicalCoverage:2.0</r:URN>
        <r:Subject xml:lang="en" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Natural Language Processing</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Språkbehandling och datorlingvistik</r:Subject>
      </r:TopicalCoverage>
      <r:SpatialCoverage />
    </r:Coverage>
    <a:Archive>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.Archive:2.0</r:URN>
      <a:ArchiveSpecific>
        <a:Item>
          <a:Access>
            <r:URN>urn:ddi:se.researchdata:doi-10-23695-56t6-rc52.Archive-ArchiveSpecificType-AccessType:2.0</r:URN>
            <a:TypeOfAccess controlledVocabularyName="info:eu-repo-Access-Terms vocabulary"></a:TypeOfAccess>
          </a:Access>
          <a:DataFileQuantity>0</a:DataFileQuantity>
        </a:Item>
      </a:ArchiveSpecific>
    </a:Archive>
  </s:StudyUnit>
</ddi:DDIInstance>