Gold standard for English-Swedish Europarl data (GES)
Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.
Data are created from the English-Swedish part of the Europarl corpus. For each sentence pair in the selected subset, token correspondences are stated as pairs of integral token identifiers
Go to data source
Opens in a new tabhttps://www.ida.liu.se/divisions/hcs/nlplab/resources/ges/
Citation and access
Citation and access
Data access level:
Creator/Principal investigator(s):
- Lars Ahrenberg - Linköping University - Department of Computer and Information Science
- Maria Holmqvist - Linköping University - Department of Computer and Information Science
Research principal:
Citation:
Corpus
Corpus
Foreseen use:
NLP application
Text part
Text part
Linguality:
Bilingual
Language:
English (eng)
:
Swedish (swe)
Sentences: 1164
Modality:
Written Language
Size:
Sentences: 1164
Annotation:
Alignment
Manual
Original source:
Link to other media:
Administrative information
Administrative information
Responsible department/unit:
Department of Computer and Information Science
Topic and keywords
Topic and keywords
Standard för svensk indelning av forskningsämnen 2025:
Keywords:
Relations
Relations
Publications
Publications
Citation:
Maria Holmqvist and Lars Ahrenberg (2011). A Gold Standard for English-Swedish Word Alignment. In Proceedings of the 18th Nordic Conference on Computational Linguistics, Riga, Latvia, May 11-13, 2011.
