<ddi:DDIInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:instance:3_3 http://ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/instance.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ddi="ddi:instance:3_3" xmlns:r="ddi:reusable:3_3" xmlns:s="ddi:studyunit:3_3" xmlns:d="ddi:datacollection:3_3" xmlns:a="ddi:archive:3_3" xmlns:c="ddi:conceptualcomponent:3_3" xmlns:cm="ddi:comparative:3_3" xmlns:g="ddi:group:3_3" xmlns:l="ddi:logicalproduct:3_3" xmlns:p="ddi:physicaldataproduct:3_3" xmlns:pi="ddi:physicalinstance:3_3" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xml="http://www.w3.org/XML/1998/namespace" isMaintainable="true" scopeOfUniqueness="Agency">
  <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26:0</r:URN>
  <r:Agency>SND</r:Agency>
  <r:ID>doi-10-23695-yepn-se26</r:ID>
  <r:Version>0</r:Version>
  <g:ResourcePackage>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.ResourcePackage:2.0</r:URN>
    <r:OtherMaterialScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.OtherMaterialScheme:2.0</r:URN>
    </r:OtherMaterialScheme>
    <a:OrganizationScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.OrganizationScheme-0:2.0</r:URN>
      <a:Individual>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.Individual-0:2.0</r:URN>
        <a:IndividualIdentification>
          <a:IndividualName>
            <a:FullName>
              <r:String>Morger, Felix</r:String>
            </a:FullName>
          </a:IndividualName>
        </a:IndividualIdentification>
      </a:Individual>
    </a:OrganizationScheme>
  </g:ResourcePackage>
  <s:StudyUnit>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.StudyUnit:2.0</r:URN>
    <r:UserID typeOfUserID="datasetIdentifier">doi-10-23695-yepn-se26</r:UserID>
    <r:Citation>
      <r:Title>
        <r:String xml:lang="sv">SweDiagnostics</r:String>
        <r:String xml:lang="en">SweDiagnostics</r:String>
      </r:Title>
      <r:Creator>
        <r:CreatorReference>
          <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.Individual-0:2.0</r:URN>
          <r:TypeOfObject>Individual</r:TypeOfObject>
        </r:CreatorReference>
      </r:Creator>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:PublicationDate>
        <r:SimpleDate>2024-01-01</r:SimpleDate>
      </r:PublicationDate>
      <r:InternationalIdentifier>
        <r:IdentifierContent>10.23695/YEPN-SE26</r:IdentifierContent>
        <r:ManagingAgency controlledVocabularyAgencyName="DOI">DOI</r:ManagingAgency>
      </r:InternationalIdentifier>
    </r:Citation>
    <r:Abstract>
      <r:Content xml:lang="sv">I. IDENTIFYING INFORMATION

Title*
SuperLim Diagnostic Dataset, v1.1

Subtitle

Created by*
Felix Morger, Gothenburg University (felix.morger@gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/superlim 

License(s)*
CC BY 4.0

Abstract*
Manual Swedish translation of all 1106 sentence pairs of the SuperGLUE diagnostic dataset.

Funded by*
Vinnova (grants no. 2020-02523, 2021-04165)

Cite as

Related datasets
SuperLim, SuperGLUE diagnostic dataset, FraCaS test suite

II. USAGE

Key applications
Fine-grained analysis of system performance on a broad range of linguistic phenomena.

Intended task(s)/usage(s)
Natural language inference.

Recommended evaluation measures
Krippendorff's alpha (the official SuperLim measure), Matthews' correlation coefficient.

Dataset function(s)
Diagnostics

Recommended split(s)
Test only

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
1106

Nature of the content*
Pairs of sentences annotated according with their inference relation and the linguistic phenomena that account for their differencs

Format*
JSONL and TSV. Nine columns/objects: id, four columns with the information about the relevant linguistic phenomena; domain; label; premise; hypothesis

Data source(s)*
SuperGLUE Diagnostic Dataset: Pruksachatkun, Yada &amp; Nangia, Nikita &amp; Singh, Amanpreet &amp; Michael, Julian &amp; Hill, Felix &amp; Levy, Omer &amp; Bowman, Samuel. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. 

Data collection method(s)*
See original source. 

Data selection and filtering*
See original source. 

Data preprocessing*
See original source. 

Data labeling*
Some data labels (annotations) were changed to fit with Swedish example, but in general the aim was to keep such changes to a minimum. 

Annotator characteristics

IV. ETHICS AND CAVEATS

Ethical considerations
See original data source.

Things to watch out for
See original data source.

V. ABOUT DOCUMENTATION

Data last updated*
2023-03-01, v1.1

Which changes have been made, compared to the previous version*
Minor format changes

Access to previous versions

This document created*
2021-06-04, Felix Morger.

This document last updated*
2023-04-02, Aleksandrs Berdicevskis.

Where to look for further details

Documentation template version*
v1.1

VI. OTHER

Related projects

References</r:Content>
      <r:Content xml:lang="en">I. IDENTIFYING INFORMATION

Title*
SuperLim Diagnostic Dataset, v1.1

Subtitle

Created by*
Felix Morger, Gothenburg University (felix.morger@gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/superlim 

License(s)*
CC BY 4.0

Abstract*
Manual Swedish translation of all 1106 sentence pairs of the SuperGLUE diagnostic dataset.

Funded by*
Vinnova (grants no. 2020-02523, 2021-04165)

Cite as

Related datasets
SuperLim, SuperGLUE diagnostic dataset, FraCaS test suite

II. USAGE

Key applications
Fine-grained analysis of system performance on a broad range of linguistic phenomena.

Intended task(s)/usage(s)
Natural language inference.

Recommended evaluation measures
Krippendorff's alpha (the official SuperLim measure), Matthews' correlation coefficient.

Dataset function(s)
Diagnostics

Recommended split(s)
Test only

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
1106

Nature of the content*
Pairs of sentences annotated according with their inference relation and the linguistic phenomena that account for their differencs

Format*
JSONL and TSV. Nine columns/objects: id, four columns with the information about the relevant linguistic phenomena; domain; label; premise; hypothesis

Data source(s)*
SuperGLUE Diagnostic Dataset: Pruksachatkun, Yada &amp; Nangia, Nikita &amp; Singh, Amanpreet &amp; Michael, Julian &amp; Hill, Felix &amp; Levy, Omer &amp; Bowman, Samuel. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. 

Data collection method(s)*
See original source. 

Data selection and filtering*
See original source. 

Data preprocessing*
See original source. 

Data labeling*
Some data labels (annotations) were changed to fit with Swedish example, but in general the aim was to keep such changes to a minimum. 

Annotator characteristics

IV. ETHICS AND CAVEATS

Ethical considerations
See original data source.

Things to watch out for
See original data source.

V. ABOUT DOCUMENTATION

Data last updated*
2023-03-01, v1.1

Which changes have been made, compared to the previous version*
Minor format changes

Access to previous versions

This document created*
2021-06-04, Felix Morger.

This document last updated*
2023-04-02, Aleksandrs Berdicevskis.

Where to look for further details

Documentation template version*
v1.1

VI. OTHER

Related projects

References</r:Content>
    </r:Abstract>
    <r:Coverage>
      <r:TopicalCoverage>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.TopicalCoverage:2.0</r:URN>
        <r:Subject xml:lang="en" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Natural Language Processing</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Språkbehandling och datorlingvistik</r:Subject>
      </r:TopicalCoverage>
      <r:SpatialCoverage />
    </r:Coverage>
    <a:Archive>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.Archive:2.0</r:URN>
      <a:ArchiveSpecific>
        <a:Item>
          <a:Access>
            <r:URN>urn:ddi:se.researchdata:doi-10-23695-yepn-se26.Archive-ArchiveSpecificType-AccessType:2.0</r:URN>
            <a:TypeOfAccess controlledVocabularyName="info:eu-repo-Access-Terms vocabulary"></a:TypeOfAccess>
          </a:Access>
          <a:DataFileQuantity>0</a:DataFileQuantity>
        </a:Item>
      </a:ArchiveSpecific>
    </a:Archive>
  </s:StudyUnit>
</ddi:DDIInstance>