The Swedish Culturomics Gigaword Corpus
https://doi.org/10.23695/3WMV-1Z09
One billion Swedish words from 1950 and onwards.
Please reference the dataset using the following reference:
Stian Rødven Eide, Nina Tahmasebi, Lars Borin. 2016. The Swedish Culturomics Gigaword Corpus: A One Billion Word Swedish Reference Dataset for NLP
Code to extract data from the corpus, as well as usage instructions,
can be downloaded from https://svn.spraakbanken.gu.se/sb-arkiv/tools/gigawordOpens in a new tab
Sentences per year for each genre
fiction
government
news
science
socialmedia
1950
-
420 413
-
-
-
1960
-
424 920
-
-
-
1965
-
-
53 624
-
-
1970
-
459 867
-
-
-
1976
-
-
89 175
-
-
1977
499 030
-
-
-
-
1980
-
534 194
-
-
-
1981
307 597
-
-
-
-
1987
97 398
-
364 226
-
-
1990
-
551 988
-
-
-
1991
330 127
-
-
-
-
1992
-
-
-
44 538
-
1994
-
391 882
1 538 748
-
-
1995
-
-
514 797
-
-
1996
-
-
449 148
118 542
-
1997
-
-
980 230
125 096
-
1998
-
-
804 178
121 895
1 638
1999
194 699
-
-
113 568
40 099
2000
-
-
-
109 289
12 945
2001
-
-
1 393 257
115 012
20 006
2002
-
41 066
2 610 740
110 830
191 234
2003
-
-
2 095 700
96 778
16 382
2004
-
-
2 094 251
103 881
487 447
2005
-
-
3 013 787
85 023
985 094
2006
-
50 684
2 634 386
-
408 425
2007
-
-
2 530 808
523 102
1 638 311
2008
-
-
2 607 657
-
754 801
2009
-
-
2 795 855
-
605 194
2010
-
-
2 635 687
-
790 148
2011
-
-
2 973 928
-
957 017
2012
-
-
2 681 277
673 820
1 589 999
2013
-
-
2 501 426
-
594 982
2014
-
-
-
-
590 146
2015
-
-
-
12 293 254
187 253
Go to data source
Opens in a new tabhttps://doi.org/10.23695/3WMV-1Z09
Citation and access
Citation and access
Creator/Principal investigator(s):
- Rødven-Eide, Stian
Research principal:
Citation:
Language:
Administrative information
Administrative information
Topic and keywords
Topic and keywords
Metadata
Metadata
