<ddi:DDIInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:instance:3_3 http://ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/instance.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ddi="ddi:instance:3_3" xmlns:r="ddi:reusable:3_3" xmlns:s="ddi:studyunit:3_3" xmlns:d="ddi:datacollection:3_3" xmlns:a="ddi:archive:3_3" xmlns:c="ddi:conceptualcomponent:3_3" xmlns:cm="ddi:comparative:3_3" xmlns:g="ddi:group:3_3" xmlns:l="ddi:logicalproduct:3_3" xmlns:p="ddi:physicaldataproduct:3_3" xmlns:pi="ddi:physicalinstance:3_3" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xml="http://www.w3.org/XML/1998/namespace" isMaintainable="true" scopeOfUniqueness="Agency">
  <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030:0</r:URN>
  <r:Agency>SND</r:Agency>
  <r:ID>doi-10-23695-ewz4-5030</r:ID>
  <r:Version>0</r:Version>
  <g:ResourcePackage>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.ResourcePackage:2.0</r:URN>
    <r:OtherMaterialScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.OtherMaterialScheme:2.0</r:URN>
    </r:OtherMaterialScheme>
    <a:OrganizationScheme>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.OrganizationScheme-0:2.0</r:URN>
      <a:Organization>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.Organization-0:2.0</r:URN>
        <a:OrganizationIdentification>
          <a:OrganizationName>
            <r:String xml:lang="en">Språkbanken Text</r:String>
          </a:OrganizationName>
        </a:OrganizationIdentification>
      </a:Organization>
    </a:OrganizationScheme>
  </g:ResourcePackage>
  <s:StudyUnit>
    <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.StudyUnit:2.0</r:URN>
    <r:UserID typeOfUserID="datasetIdentifier">doi-10-23695-ewz4-5030</r:UserID>
    <r:Citation>
      <r:Title>
        <r:String xml:lang="sv">Förtränade inbäddningar</r:String>
        <r:String xml:lang="en">Pretrained embeddings</r:String>
      </r:Title>
      <r:Creator>
        <r:CreatorReference>
          <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.Individual-0:2.0</r:URN>
          <r:TypeOfObject>Individual</r:TypeOfObject>
        </r:CreatorReference>
      </r:Creator>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:Publisher>
        <r:PublisherName>
          <r:String xml:lang="sv">Göteborgs universitet</r:String>
          <r:String xml:lang="en">University of Gothenburg</r:String>
        </r:PublisherName>
      </r:Publisher>
      <r:PublicationDate>
        <r:SimpleDate>2025-01-01</r:SimpleDate>
      </r:PublicationDate>
      <r:InternationalIdentifier>
        <r:IdentifierContent>10.23695/EWZ4-5030</r:IdentifierContent>
        <r:ManagingAgency controlledVocabularyAgencyName="DOI">DOI</r:ManagingAgency>
      </r:InternationalIdentifier>
    </r:Citation>
    <r:Abstract>
      <r:Content xml:lang="sv">Embeddings (mappings of linguistic units, such as words, sentences, characters, to vectors of real numbers) are playing an extremely important role in modern language technology. Training the embedding models is often costly, which is why pretrained embeddings are widely used. On this page we provide lists of various pretrained embeddings for Swedish and of studies that focus on evaluating Swedish embeddings. If you have suggestions or comments, please contact us.
Embeddings

Facebook FastText models: Common Crawl + Wiki, Wiki, Wiki with cross-lingual alignment
NLPL repository: Word2Vec Continuous Skipgram (CoNLL17 corpus); ELMO (CoNLL17 corpus), ELMO (Wiki)
NLPLAB at Linköping University: a pretrained Word2Vec model (trained on a Göteborgs-Posten corpus); a script for training both cbow and sgns Word2Vec; a paper comparing Word2Vec and GloVe to Saldo
The National Library's (Kungliga bibliotekets) models: BERT, BERT fine-tuned for NER, ALBERT
The Public Employment Service's (Arbetsförmedlingens )models: BERT
Polyglot
Kyubyong Park's models: trained on Wiki, Word2Vec and FastText
Flair models. See also our Flair model for Swedish POS tagging.
Språkbanken Text diachronic embeddings.

Evaluation studies

Sahlgren, Magnus, and Fredrik Olsson. 2016. Gender Bias in Pretrained Swedish Embeddings. Proceedings of the 22nd Nordic Conference on Computational Linguistics.
Fallgren, Per, Jesper Segeblad, and Marco Kuhlmann. 2016. Towards a standard dataset of swedish word vectors. Sixth Swedish Language Technology Conference (SLTC).
Holmer, Daniel. 2020. Context matters: Classifying Swedish texts using BERT's deep bidirectional word embeddings. Bachelor thesis at Linköping University.
Adewumi, Tosin, Foteini Liwicki and Marcus Liwicki. 2020. Exploring Swedish &amp; English fastText Embeddings with the Transformer
Adewumi, Tosin, Foteini Liwicki and Marcus Liwicki. 2020. Corpora Compared: The Case of the Swedish Gigaword &amp; Wikipedia Corpora</r:Content>
      <r:Content xml:lang="en">Embeddings (mappings of linguistic units, such as words, sentences, characters, to vectors of real numbers) are playing an extremely important role in modern language technology. Training the embedding models is often costly, which is why pretrained embeddings are widely used. On this page we provide lists of various pretrained embeddings for Swedish and of studies that focus on evaluating Swedish embeddings. If you have suggestions or comments, please contact us.
Embeddings

Facebook FastText models: Common Crawl + Wiki, Wiki, Wiki with cross-lingual alignment
NLPL repository: Word2Vec Continuous Skipgram (CoNLL17 corpus); ELMO (CoNLL17 corpus), ELMO (Wiki)
NLPLAB at Linköping University: a pretrained Word2Vec model (trained on a Göteborgs-Posten corpus); a script for training both cbow and sgns Word2Vec; a paper comparing Word2Vec and GloVe to Saldo
The National Library's (Kungliga bibliotekets) models: BERT, BERT fine-tuned for NER, ALBERT
The Public Employment Service's (Arbetsförmedlingens )models: BERT
Polyglot
Kyubyong Park's models: trained on Wiki, Word2Vec and FastText
Flair models. See also our Flair model for Swedish POS tagging.
Språkbanken Text diachronic embeddings.

Evaluation studies

Sahlgren, Magnus, and Fredrik Olsson. 2016. Gender Bias in Pretrained Swedish Embeddings. Proceedings of the 22nd Nordic Conference on Computational Linguistics.
Fallgren, Per, Jesper Segeblad, and Marco Kuhlmann. 2016. Towards a standard dataset of swedish word vectors. Sixth Swedish Language Technology Conference (SLTC).
Holmer, Daniel. 2020. Context matters: Classifying Swedish texts using BERT's deep bidirectional word embeddings. Bachelor thesis at Linköping University.
Adewumi, Tosin, Foteini Liwicki and Marcus Liwicki. 2020. Exploring Swedish &amp; English fastText Embeddings with the Transformer
Adewumi, Tosin, Foteini Liwicki and Marcus Liwicki. 2020. Corpora Compared: The Case of the Swedish Gigaword &amp; Wikipedia Corpora</r:Content>
    </r:Abstract>
    <r:Coverage>
      <r:TopicalCoverage>
        <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.TopicalCoverage:2.0</r:URN>
        <r:Subject xml:lang="en" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Natural Language Processing</r:Subject>
        <r:Subject xml:lang="sv" controlledVocabularyID="10208" controlledVocabularyName="Standard för svensk indelning av forskningsämnen 2025">Språkbehandling och datorlingvistik</r:Subject>
      </r:TopicalCoverage>
      <r:SpatialCoverage />
    </r:Coverage>
    <a:Archive>
      <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.Archive:2.0</r:URN>
      <a:ArchiveSpecific>
        <a:Item>
          <a:Access>
            <r:URN>urn:ddi:se.researchdata:doi-10-23695-ewz4-5030.Archive-ArchiveSpecificType-AccessType:2.0</r:URN>
            <a:TypeOfAccess controlledVocabularyName="info:eu-repo-Access-Terms vocabulary"></a:TypeOfAccess>
          </a:Access>
          <a:DataFileQuantity>0</a:DataFileQuantity>
        </a:Item>
      </a:ArchiveSpecific>
    </a:Archive>
  </s:StudyUnit>
</ddi:DDIInstance>