<ddi:DDIInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:instance:3_3 http://ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/instance.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ddi="ddi:instance:3_3" xmlns:r="ddi:reusable:3_3" xmlns:s="ddi:studyunit:3_3" xmlns:d="ddi:datacollection:3_3" xmlns:a="ddi:archive:3_3" xmlns:c="ddi:conceptualcomponent:3_3" xmlns:cm="ddi:comparative:3_3" xmlns:g="ddi:group:3_3" xmlns:l="ddi:logicalproduct:3_3" xmlns:p="ddi:physicaldataproduct:3_3" xmlns:pi="ddi:physicalinstance:3_3" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xml="http://www.w3.org/XML/1998/namespace" isMaintainable="true" scopeOfUniqueness="Agency">
  <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78:0</r:URN>
  <r:Agency>SND</r:Agency>
  <r:ID>doi-10-23695-aryw-nh78</r:ID>
  <r:Version>0</r:Version>
  <g:ResourcePackage>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.ResourcePackage:2.0</r:URN>
    <r:OtherMaterialScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.OtherMaterialScheme:2.0</r:URN>
    </r:OtherMaterialScheme>
    <a:OrganizationScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.OrganizationScheme-0:2.0</r:URN>
      <a:Organization>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.Organization-0:2.0</r:URN>
        <a:OrganizationIdentification>
          <a:OrganizationName>
            <r:String xml:lang="en">Språkbanken Text</r:String>
          </a:OrganizationName>
        </a:OrganizationIdentification>
      </a:Organization>
    </a:OrganizationScheme>
  </g:ResourcePackage>
  <s:StudyUnit>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.StudyUnit:2.0</r:URN>
    <r:UserID typeOfUserID="datasetIdentifier">doi-10-23695-aryw-nh78</r:UserID>
    <r:Citation>
      <r:Title>
        <r:String xml:lang="sv">Ordklasstaggningsmodell: Marmot</r:String>
        <r:String xml:lang="en">POS-tagging model: Marmot</r:String>
      </r:Title>
      <r:Creator>
        <r:CreatorReference>
          <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.Individual-0:2.0</r:URN>
          <r:TypeOfObject>Individual</r:TypeOfObject>
        </r:CreatorReference>
      </r:Creator>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:PublicationDate>
        <r:SimpleDate>2024-01-01</r:SimpleDate>
      </r:PublicationDate>
      <r:InternationalIdentifier>
        <r:IdentifierContent>10.23695/ARYW-NH78</r:IdentifierContent>
        <r:ManagingAgency controlledVocabularyAgencyName="DOI">DOI</r:ManagingAgency>
      </r:InternationalIdentifier>
    </r:Citation>
    <r:Abstract>
      <r:Content xml:lang="sv">Model
 Marmot is a part-of-speech tagger that performs for Swedish somewhat worse than state-of-the-art neural models, but still yields very good results, works much faster and does not require a GPU. We provide two models for Marmot.
marmot_eval is trained on SUC3 and Talbanken_SBX_dev, using Saldo as dictionary. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set
Exact match
POS
MSD

Talbanken_SBX_test
0.973
0.982
0.988

SIC2
0.921
0.934
0.958

 Read more about the evaluation here.
marmot_full is trained on SUC3 + Talbanken_SBX_test + Talbanken_SBX_dev + SIC2 (with Saldo as dictionary). We cannot evaluate the performance of this model, but we expect it to perform better than marmot_eval, or at least not worse.
Tagging and training
Download Marmot and the necessary dependencies. Download SALDO (converted to the necessary format) here. Download our scripts from this repository.
The scripts use a tab-separated three-column format: token, POS (without MSD), MSD. Use conllu_to_tab.rb to convert CONLL(U) to the two-column format (install Ruby 1.9+ and run ruby conllu_to_tab 2 n, where n is the number of the column you want to use (if you are converting our CONLLU files, use 4). Run ruby convert_col2_to_marmot.rb to convert the resulting col2 file to Marmot's col3).

Tagging
Use java -cp marmot.jar marmot.morph.cmd.Annotator --model-file model_name.marmot --test-file form-index=0,test_corpus.col1 --pred-file output_name.conll to tag a corpus using a pretrained model. The output corpus will be in a CONLL format with a somewhat unusual order of columns, use convert_marmot_to_conllu.rb to convert it to a usual CONLLU.
Training your own models
Run Marmot: java -Xmx5G -cp marmot.jar marmot.morph.cmd.Trainer -train-file form-index=0,tag-index=1,morph-index=2,corpus.col3 -tag-morph true -model-file model_name.marmot  subtag-separator "." -type-dict saldo_marmot.txt,indexes=[2,3]</r:Content>
      <r:Content xml:lang="en">Model
 Marmot is a part-of-speech tagger that performs for Swedish somewhat worse than state-of-the-art neural models, but still yields very good results, works much faster and does not require a GPU. We provide two models for Marmot.
marmot_eval is trained on SUC3 and Talbanken_SBX_dev, using Saldo as dictionary. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set
Exact match
POS
MSD

Talbanken_SBX_test
0.973
0.982
0.988

SIC2
0.921
0.934
0.958

 Read more about the evaluation here.
marmot_full is trained on SUC3 + Talbanken_SBX_test + Talbanken_SBX_dev + SIC2 (with Saldo as dictionary). We cannot evaluate the performance of this model, but we expect it to perform better than marmot_eval, or at least not worse.
Tagging and training
Download Marmot and the necessary dependencies. Download SALDO (converted to the necessary format) here. Download our scripts from this repository.
The scripts use a tab-separated three-column format: token, POS (without MSD), MSD. Use conllu_to_tab.rb to convert CONLL(U) to the two-column format (install Ruby 1.9+ and run ruby conllu_to_tab 2 n, where n is the number of the column you want to use (if you are converting our CONLLU files, use 4). Run ruby convert_col2_to_marmot.rb to convert the resulting col2 file to Marmot's col3).

Tagging
Use java -cp marmot.jar marmot.morph.cmd.Annotator --model-file model_name.marmot --test-file form-index=0,test_corpus.col1 --pred-file output_name.conll to tag a corpus using a pretrained model. The output corpus will be in a CONLL format with a somewhat unusual order of columns, use convert_marmot_to_conllu.rb to convert it to a usual CONLLU.
Training your own models
Run Marmot: java -Xmx5G -cp marmot.jar marmot.morph.cmd.Trainer -train-file form-index=0,tag-index=1,morph-index=2,corpus.col3 -tag-morph true -model-file model_name.marmot  subtag-separator "." -type-dict saldo_marmot.txt,indexes=[2,3]</r:Content>
    </r:Abstract>
    <r:Coverage>
      <r:TopicalCoverage>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.TopicalCoverage:2.0</r:URN>
        <r:Subject xml:lang="en" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Natural Language Processing</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Språkbehandling och datorlingvistik</r:Subject>
      </r:TopicalCoverage>
      <r:SpatialCoverage />
    </r:Coverage>
    <a:Archive>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.Archive:2.0</r:URN>
      <a:ArchiveSpecific>
        <a:Item>
          <a:Access>
            <r:URN>urn:ddi:se.researchdata:doi-10-23695-aryw-nh78.Archive-ArchiveSpecificType-AccessType:2.0</r:URN>
            <a:TypeOfAccess controlledVocabularyName="info:eu-repo-Access-Terms vocabulary"></a:TypeOfAccess>
          </a:Access>
          <a:DataFileQuantity>0</a:DataFileQuantity>
        </a:Item>
      </a:ArchiveSpecific>
    </a:Archive>
  </s:StudyUnit>
</ddi:DDIInstance>