<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">TalbankenSBX</titl>
        <parTitl xml:lang="en">TalbankenSBX</parTitl>
        <IDNo agency="SND">doi-10-23695-6m9r-w377-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/6M9R-W377</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.23695/6M9R-W377">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">TalbankenSBX</titl>
        <parTitl xml:lang="en">TalbankenSBX</parTitl>
        <IDNo agency="SND">doi-10-23695-6m9r-w377-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/6M9R-W377</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="">Språkbanken Text</AuthEnty>
      </rspStmt>
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-01-01" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-01-01" />
      </verStmt>
      <holdings URI="https://doi.org/10.23695/6M9R-W377">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">Talbanken is a widely used Swedish treebank, read more about its history and different versions here. This version originated as a copy of TalbankenSTB, but unlike the STB version, this one is open to changes and corrections. This is also the version indexed by our search engine Korp. The changes made by us can be found in changelog.txt.
Annotation
The following layers of annotation were added (or corrected) manually and can be considered gold data: tokenization, sentence segmentation, POS, MSD, dependency syntax (deprel and dephead).
Tokenization, sentence segmentation, POS and MSD follow the  SUC format, syntactic annotation follows the Mamba-Dep format, a conversion of the MAMBA format used in the original Talbanken76 to dependency grammar.
Read more about these annotation layers in the documentation for TalbankenSTB or at Joakim Nivre's page: tokenization and sentence segmentation, POS and MSD, dependency syntax.
Formats and splits
TalbankenSBX is provided in our standard XML format and in a (pseudo-)CONLLU format, where UPOS is POS in the SUC format, XPOS is POS+MSD, Feats are MSD converted to the UD/CONLLU standard, and Deprel is a Mamba-Dep relation. There are currently no  text and SpaceAfter attributes.
You may convert our XML to this format Talbanken yourself using the script in this repository.
We provide two splits of TalbankenSBX. MorphSplit is used for POS-tagging purposes: the treebank is divided into two parts with the same number of sentences (the split is completely random, no blocks are used). One part is used as the development set, the other is the test set (SUC3 is the training set).
You may resplit the Talbanken yourself using the script in this repository.
SyntSplit used is for dependency parsing: the treebank is divided into the training, development and test sets. The training set is the same as the one in TalbankenSTB, whereas dev and test approximate dev and test in the UD version as much as possible. The SyntSplit is provided only in the CONLLU format.</abstract>
      <abstract xml:lang="sv" contentType="abstract">Talbanken is a widely used Swedish treebank, read more about its history and different versions here. This version originated as a copy of TalbankenSTB, but unlike the STB version, this one is open to changes and corrections. This is also the version indexed by our search engine Korp. The changes made by us can be found in changelog.txt.
Annotation
The following layers of annotation were added (or corrected) manually and can be considered gold data: tokenization, sentence segmentation, POS, MSD, dependency syntax (deprel and dephead).
Tokenization, sentence segmentation, POS and MSD follow the  SUC format, syntactic annotation follows the Mamba-Dep format, a conversion of the MAMBA format used in the original Talbanken76 to dependency grammar.
Read more about these annotation layers in the documentation for TalbankenSTB or at Joakim Nivre's page: tokenization and sentence segmentation, POS and MSD, dependency syntax.
Formats and splits
TalbankenSBX is provided in our standard XML format and in a (pseudo-)CONLLU format, where UPOS is POS in the SUC format, XPOS is POS+MSD, Feats are MSD converted to the UD/CONLLU standard, and Deprel is a Mamba-Dep relation. There are currently no  text and SpaceAfter attributes.
You may convert our XML to this format Talbanken yourself using the script in this repository.
We provide two splits of TalbankenSBX. MorphSplit is used for POS-tagging purposes: the treebank is divided into two parts with the same number of sentences (the split is completely random, no blocks are used). One part is used as the development set, the other is the test set (SUC3 is the training set).
You may resplit the Talbanken yourself using the script in this repository.
SyntSplit used is for dependency parsing: the treebank is divided into the training, development and test sets. The training set is the same as the one in TalbankenSTB, whereas dev and test approximate dev and test in the UD version as much as possible. The SyntSplit is provided only in the CONLLU format.</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>