<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">Code and Data for “Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters”</titl>
        <parTitl xml:lang="en">Code and Data for “Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters”</parTitl>
        <IDNo agency="SND">2024-283-1</IDNo>
        <IDNo agency="DOI">https://doi.org/10.57804/e9cs-gh75</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.57804/e9cs-gh75">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">Code and Data for “Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters”</titl>
        <parTitl xml:lang="en">Code and Data for “Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters”</parTitl>
        <IDNo agency="SND">2024-283-1</IDNo>
        <IDNo agency="DOI">https://doi.org/10.57804/e9cs-gh75</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Uppsala University">Dahllöf, Mats</AuthEnty>
        <AuthEnty xml:lang="sv" affiliation="Uppsala universitet">Dahllöf, Mats</AuthEnty>
      </rspStmt>
      <prodStmt>
        <grantNo xml:lang="en" agency="Riksbankens Jubileumsfond">NHS 14-2068:1</grantNo>
      </prodStmt>
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2020-01-03" />
      </distStmt>
      <verStmt>
        <version elementVersion="1" elementVersionDate="2020-01-03" />
      </verStmt>
      <holdings URI="https://doi.org/10.57804/e9cs-gh75">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">Code and data for the article Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters (to appear in DHN2020 Digital Humanities in the Nordic Countries}, Riga, 17--20 March 2020).
The zip-file contains Python code, an XML data file, and a pdf document.

The study based on this code and dataset is a comparative exploration of different classification tasks for Swedish medieval charters (transcriptions from the SDHK collection) and different classifier setups. In particular, we explore the identification of the issuer, place of issue, and decade of production. The experiments used features based on lowercased words and character 3- and 4-grams. We evaluated the performance of two learning algorithms: linear discriminant analysis and decision trees. For evaluation, five-fold cross-validation was performed. We report accuracy and macro-averaged F1 score. The validation made use of six labeled subsets of SDHK combining the three tasks with Old Swedish and Latin. Issuer identification for the Latin dataset (595 charters from 12 issuers) reached the highest scores, above 0.9, for the decision tree classifier using word features. The best corresponding accuracy for Old Swedish was 0.81. Place and decade identification produced lower performance scores for both languages. Which classifier design is the best one seems to depend on peculiarities of the dataset and the classification task. The present study does however support the idea that text classification is useful also for medieval documents characterized by extreme spelling variation.

The dataset was originally published in DiVA and moved to SND in 2024.</abstract>
      <abstract xml:lang="sv" contentType="abstract">Se den engelska versionen av denna katalogsida för information om datasetet.

Datasetet har ursprungligen publicerats i DiVA och flyttades över till SND 2024.</abstract>
      <sumDscr>
        <dataKind xml:lang="en">Numeric</dataKind>
        <dataKind xml:lang="en">Text</dataKind>
      </sumDscr>
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through SND. Data are freely accessible.</restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via SND. Data är fritt tillgängliga.</restrctn>
        <conditions elementVersion="info:eu-repo-Access-Terms vocabulary">openAccess</conditions>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>