<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">SweFraCas 1.0</titl>
        <parTitl xml:lang="en">SweFraCas 1.0</parTitl>
        <IDNo agency="SND">doi-10-23695-gfwn-qk37-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/GFWN-QK37</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.23695/GFWN-QK37">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">SweFraCas 1.0</titl>
        <parTitl xml:lang="en">SweFraCas 1.0</parTitl>
        <IDNo agency="SND">doi-10-23695-gfwn-qk37-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/GFWN-QK37</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="">Språkbanken Text</AuthEnty>
      </rspStmt>
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-01-01" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-01-01" />
      </verStmt>
      <holdings URI="https://doi.org/10.23695/GFWN-QK37">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">I. IDENTIFYING INFORMATION

Title*
SweFracas v1.0

Subtitle
A Swedish version of the Fracas inference/entailment dataset

Created by*
Lars Borin (lars.borin@gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/swefracas

License(s)*
CC BY 4.0

Abstract*
A textual inference/entailment problem set, derived from FraCas. The original English Fracas [1] was converted to html and edited by Bill MacCartney [2], and then automatically translated to Swedish by Peter Ljunglöf and Magdalena Siverbo [3]. The current tabular form of the set was created by Aleksandrs Berdicevskis by merging the Swedish and English versions and removing some of the problems. Finally, Lars Borin went through all the translations, correcting and Swedifying them manually. As a result, many translations are rather liberal and diverge noticeably from the English original

Funded by*
Vinnova (grant no. 2020-02523)

Cite as

Related datasets
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim). See also Abstract

II. USAGE

Key applications
Machine Learning, Inference, Entailment, Evaluation of language models, Diagnostics

Intended task(s)/usage(s)
(1) Evaluate models on the following task: given the question and the premises, choose the suitable answer (Ja 'Yes'; Nej 'No'; Vet ej 'Don't know'; Jo 'Positive answer to a negated question')

Recommended evaluation measures
(1) R4 (Matthews correlation coefficient)

Dataset function(s)
Testing

Recommended split(s)
Test data only

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
305 problems

Nature of the content*
Inference problems, where a question has to be answered, given a number of promises

Format*
Tab-separated, five columns:
 "id" -- unique integer id of the problem;
 "original_id" -- the id of the corresponding problem in the original dataset
 "attribute" -- which attribute does the row within the problem contain: "premiss" (premise), "fråga" (question), "svar" (answer), "why" and "note". The latter two are taken from MacCartney's conversion and refer only to English data. They are kept for information only;
 "value" -- the Swedish sentence. "why" and "note" are always empty for Swedish;
 "original_value" -- the original English sentence. Provided for information only. Note that many translations are rather liberal.

Data source(s)*
See Abstract

Data collection method(s)*
See Abstract

Data selection and filtering*
41 problems in the original set did not have a definite answer (different answers were possible depending on the interpretation). They were excluded.

Data preprocessing*
None

Data labeling*
Most of the labels map straightforwardly on the original English labels (Yes  Ja, Don't know  Vet ej, No  Nej), with three exceptions: 97, 98 (Nej  Jo) and 108 (No  Vet ej)

Annotator characteristics
PhD in linguistics; native speaker of Swedish

IV. ETHICS AND CAVEATS

Ethical considerations

Things to watch out for
In the original dataset, all examples were classified by the linguistic phenomena they represent. It is not necessary that the Swedish translations follow exactly the same classification (most of them probably do, but it has not been checked).

V. ABOUT DOCUMENTATION

Data last updated*
2021-06-09, v1.0

Which changes have been made, compared to the previous version*
This is the first official version

Access to previous versions

This document created*
2021-06-09, Aleksandrs Berdicevskis

This document last updated*
2021-06-09, Aleksandrs Berdicevskis

Where to look for further details

Documentation template version*
v1.0

VI. OTHER

Related projects

References
[1] Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Johan Van Genabith, Jan Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, et al. 1996. Using the framework. Technical report, Technical Report LRE 62-051 D-16, The FraCaS Consortium.     ftp://ftp.cogsci.ed.ac.uk/pub/FRACAS/del16.ps.gz
 [2] https://nlp.stanford.edu/~wcmac/downloads/fracas.xml
 [3] Peter Ljunglöf and Magdalena Siverbo. 2012. A bilingual treebank for the FraCas test suite. In SLTC 2012, page 53. https://gup.ub.gu.se/publication/168965?lang=en, https://gup.ub.gu.se/publication/168965?lang=en</abstract>
      <abstract xml:lang="sv" contentType="abstract">I. IDENTIFYING INFORMATION

Title*
SweFracas v1.0

Subtitle
A Swedish version of the Fracas inference/entailment dataset

Created by*
Lars Borin (lars.borin@gu.se)

Publisher(s)*
Språkbanken Text (sb-info@svenska.gu.se)

Link(s) / permanent identifier(s)*
https://spraakbanken.gu.se/en/resources/swefracas

License(s)*
CC BY 4.0

Abstract*
A textual inference/entailment problem set, derived from FraCas. The original English Fracas [1] was converted to html and edited by Bill MacCartney [2], and then automatically translated to Swedish by Peter Ljunglöf and Magdalena Siverbo [3]. The current tabular form of the set was created by Aleksandrs Berdicevskis by merging the Swedish and English versions and removing some of the problems. Finally, Lars Borin went through all the translations, correcting and Swedifying them manually. As a result, many translations are rather liberal and diverge noticeably from the English original

Funded by*
Vinnova (grant no. 2020-02523)

Cite as

Related datasets
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim). See also Abstract

II. USAGE

Key applications
Machine Learning, Inference, Entailment, Evaluation of language models, Diagnostics

Intended task(s)/usage(s)
(1) Evaluate models on the following task: given the question and the premises, choose the suitable answer (Ja 'Yes'; Nej 'No'; Vet ej 'Don't know'; Jo 'Positive answer to a negated question')

Recommended evaluation measures
(1) R4 (Matthews correlation coefficient)

Dataset function(s)
Testing

Recommended split(s)
Test data only

III. DATA

Primary data*
Text

Language*
Swedish

Dataset in numbers*
305 problems

Nature of the content*
Inference problems, where a question has to be answered, given a number of promises

Format*
Tab-separated, five columns:
 "id" -- unique integer id of the problem;
 "original_id" -- the id of the corresponding problem in the original dataset
 "attribute" -- which attribute does the row within the problem contain: "premiss" (premise), "fråga" (question), "svar" (answer), "why" and "note". The latter two are taken from MacCartney's conversion and refer only to English data. They are kept for information only;
 "value" -- the Swedish sentence. "why" and "note" are always empty for Swedish;
 "original_value" -- the original English sentence. Provided for information only. Note that many translations are rather liberal.

Data source(s)*
See Abstract

Data collection method(s)*
See Abstract

Data selection and filtering*
41 problems in the original set did not have a definite answer (different answers were possible depending on the interpretation). They were excluded.

Data preprocessing*
None

Data labeling*
Most of the labels map straightforwardly on the original English labels (Yes  Ja, Don't know  Vet ej, No  Nej), with three exceptions: 97, 98 (Nej  Jo) and 108 (No  Vet ej)

Annotator characteristics
PhD in linguistics; native speaker of Swedish

IV. ETHICS AND CAVEATS

Ethical considerations

Things to watch out for
In the original dataset, all examples were classified by the linguistic phenomena they represent. It is not necessary that the Swedish translations follow exactly the same classification (most of them probably do, but it has not been checked).

V. ABOUT DOCUMENTATION

Data last updated*
2021-06-09, v1.0

Which changes have been made, compared to the previous version*
This is the first official version

Access to previous versions

This document created*
2021-06-09, Aleksandrs Berdicevskis

This document last updated*
2021-06-09, Aleksandrs Berdicevskis

Where to look for further details

Documentation template version*
v1.0

VI. OTHER

Related projects

References
[1] Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Johan Van Genabith, Jan Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, et al. 1996. Using the framework. Technical report, Technical Report LRE 62-051 D-16, The FraCaS Consortium.     ftp://ftp.cogsci.ed.ac.uk/pub/FRACAS/del16.ps.gz
 [2] https://nlp.stanford.edu/~wcmac/downloads/fracas.xml
 [3] Peter Ljunglöf and Magdalena Siverbo. 2012. A bilingual treebank for the FraCas test suite. In SLTC 2012, page 53. https://gup.ub.gu.se/publication/168965?lang=en, https://gup.ub.gu.se/publication/168965?lang=en</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>