<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">The Swedish Culturomics Gigaword Corpus</titl>
        <parTitl xml:lang="en">The Swedish Culturomics Gigaword Corpus</parTitl>
        <IDNo agency="SND">doi-10-23695-3wmv-1z09-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/3WMV-1Z09</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.23695/3WMV-1Z09">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">The Swedish Culturomics Gigaword Corpus</titl>
        <parTitl xml:lang="en">The Swedish Culturomics Gigaword Corpus</parTitl>
        <IDNo agency="SND">doi-10-23695-3wmv-1z09-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.23695/3WMV-1Z09</IDNo>
      </titlStmt>
      <rspStmt />
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2024-01-01" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2024-01-01" />
      </verStmt>
      <holdings URI="https://doi.org/10.23695/3WMV-1Z09">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">One billion Swedish words from 1950 and onwards.
Please reference the dataset using the following reference:
Stian Rødven Eide, Nina Tahmasebi, Lars Borin. 2016. The Swedish Culturomics Gigaword Corpus: A One Billion Word Swedish Reference Dataset for NLP

Code to extract data from the corpus, as well as usage instructions,
 can be downloaded from https://svn.spraakbanken.gu.se/sb-arkiv/tools/gigaword/

Sentences per year for each genre

fiction
government
news
science
socialmedia

1950
-
420 413
-
-
-

1960
-
424 920
-
-
-

1965
-
-
53 624
-
-

1970
-
459 867
-
-
-

1976
-
-
89 175
-
-

1977
499 030
-
-
-
-

1980
-
534 194
-
-
-

1981
307 597
-
-
-
-

1987
97 398
-
364 226
-
-

1990
-
551 988
-
-
-

1991
330 127
-
-
-
-

1992
-
-
-
44 538
-

1994
-
391 882
1 538 748
-
-

1995
-
-
514 797
-
-

1996
-
-
449 148
118 542
-

1997
-
-
980 230
125 096
-

1998
-
-
804 178
121 895
1 638

1999
194 699
-
-
113 568
40 099

2000
-
-
-
109 289
12 945

2001
-
-
1 393 257
115 012
20 006

2002
-
41 066
2 610 740
110 830
191 234

2003
-
-
2 095 700
96 778
16 382

2004
-
-
2 094 251
103 881
487 447

2005
-
-
3 013 787
85 023
985 094

2006
-
50 684
2 634 386
-
408 425

2007
-
-
2 530 808
523 102
1 638 311

2008
-
-
2 607 657
-
754 801

2009
-
-
2 795 855
-
605 194

2010
-
-
2 635 687
-
790 148

2011
-
-
2 973 928
-
957 017

2012
-
-
2 681 277
673 820
1 589 999

2013
-
-
2 501 426
-
594 982

2014
-
-
-
-
590 146

2015
-
-
-
12 293 254
187 253</abstract>
      <abstract xml:lang="sv" contentType="abstract">En miljard ord ur svenska korpusar från 1950 och framåt.
Vänligen använd följande artikel som referens för datasetet:
Stian Rødven Eide, Nina Tahmasebi, Lars Borin. 2016. The Swedish Culturomics Gigaword Corpus: A One Billion Word Swedish Reference Dataset for NLP

Kod för att extrahera data från korpusen, samt
 användningsinstruktioner, kan laddas ner från
 https://svn.spraakbanken.gu.se/sb-arkiv/tools/gigaword/

Sentences per year for each genre

fiction
government
news
science
socialmedia

1950
-
420 413
-
-
-

1960
-
424 920
-
-
-

1965
-
-
53 624
-
-

1970
-
459 867
-
-
-

1976
-
-
89 175
-
-

1977
499 030
-
-
-
-

1980
-
534 194
-
-
-

1981
307 597
-
-
-
-

1987
97 398
-
364 226
-
-

1990
-
551 988
-
-
-

1991
330 127
-
-
-
-

1992
-
-
-
44 538
-

1994
-
391 882
1 538 748
-
-

1995
-
-
514 797
-
-

1996
-
-
449 148
118 542
-

1997
-
-
980 230
125 096
-

1998
-
-
804 178
121 895
1 638

1999
194 699
-
-
113 568
40 099

2000
-
-
-
109 289
12 945

2001
-
-
1 393 257
115 012
20 006

2002
-
41 066
2 610 740
110 830
191 234

2003
-
-
2 095 700
96 778
16 382

2004
-
-
2 094 251
103 881
487 447

2005
-
-
3 013 787
85 023
985 094

2006
-
50 684
2 634 386
-
408 425

2007
-
-
2 530 808
523 102
1 638 311

2008
-
-
2 607 657
-
754 801

2009
-
-
2 795 855
-
605 194

2010
-
-
2 635 687
-
790 148

2011
-
-
2 973 928
-
957 017

2012
-
-
2 681 277
673 820
1 589 999

2013
-
-
2 501 426
-
594 982

2014
-
-
-
-
590 146

2015
-
-
-
12 293 254
187 253</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>