Gold standard for English-Swedish Europarl data (GES)

Lars Ahrenberg; Maria Holmqvist

Gold standard for English-Swedish Europarl data (GES)

Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl. Data are created from the English-Swedish part of the Europarl corpus. For each sentence pair in the selected subset, token correspondences are stated as pairs of integral token identifiers

Go to data source

https://www.ida.liu.se/divisions/hcs/nlplab/resources/ges/

Citation and access

Data access level:

Data are openly accessible

Creator/Principal investigator(s):

Lars Ahrenberg – Linköping University - Department of Computer and Information Science
Maria Holmqvist – Linköping University - Department of Computer and Information Science

Research principal:

Linköping University
Opens a new window at ror.org.
ROR

Citation:

License:

Creative Commons Attribution 4.0 International (CC BY 4.0)

Corpus

Foreseen use:

NLP application

Text part

Linguality:

Bilingual

Language:

English (eng)
:
Swedish (swe)
Sentences: 1164

Modality:

Written Language

Size:

Sentences: 1164

Annotation:

Alignment
Manual

Original source:

Link to other media:

Method and outcome

Data format/data structure:

Administrative information

Responsible department/unit:

Department of Computer and Information Science

Topic and keywords

Swedish Standard Classification of Research Subjects 2025:

Keywords:

Relations

Website:

A Gold Standard Word Alignment for English-Swedish

Publications

Citation:

Maria Holmqvist and Lars Ahrenberg (2011). A Gold Standard for English-Swedish Word Alignment. In Proceedings of the 18th Nordic Conference on Computational Linguistics, Riga, Latvia, May 11-13, 2011.

Contact

Lars Ahrenberglars.ahrenberg@liu.se

Metadata

Version 1

Linköping University

Gold standard for English-Swedish Europarl data (GES)

Citation and access

Data access level:

Creator/​Principal investigator(s):

Research principal:

Citation:

License:

Corpus

Foreseen use:

Text part

Linguality:

Language:

Modality:

Size:

Annotation:

Original source:

Link to other media:

Method and outcome

Data format/​data structure:

Administrative information

Responsible department/​unit:

Topic and keywords

Swedish Standard Classification of Research Subjects 2025:

Keywords:

Relations

Website:

Publications

Citation:

Contact

Metadata

Creator/Principal investigator(s):

Data format/data structure:

Responsible department/unit: