English-Swedish-Turkish Corpus

Beáta Megyesi; Éva Csató Johanson; Bengt Dahlqvist; Joakim Nivre; Eva Pettersson

English-Swedish-Turkish Corpus

We describe a syntactically annotated parallel corpus containing typologically partly different languages, namely English, Swedish andTurkish. The corpus consists of approximately 300 000 tokens in Swedish, 160 000 in Turkish and 150 000 in English, containing bothfiction and technical documents. We build the corpus by using the Uplug toolkit for automatic structural markup, such as tokenizationand sentence segmentation, as well as sentence and word alignment. In addition, we use basic language resource kits for the linguisticanalysis of the languages involved. The annotation is carried on various layers from morphological and part of speech analysis todependency structures. The tools used for linguistic annotation, e.g., HunPos tagger and MaltParser, are freely available data-drivenresources, trained on existing corpora and treebanks for each language. The parallel treebank is used in teaching and linguistic researchto study the relationship between the structurally different languages. In order to study the treebank, several tools have been developedfor the visualization of the annotation and alignment, allowing search for linguistic patterns. Purpose: The main goal of the project is to promote research and teaching in the Turkish language. More specifically, the aim is to build a language resource for Turkish, Swedish and English allowing contrastive studies between the involved languages.

Go to data source

https://web.archive.org/web/20161227013750/http://stp.lingfil.uu.se/~bea/turkiska/index.html

Citation and access

Data access level:

Access to data is restricted

Creator/Principal investigator(s):

Beáta Megyesi - Uppsala University - Department of Linguistics and Philology
Éva Csató Johanson - Uppsala University - Department of Linguistics and Philology
Bengt Dahlqvist - Uppsala University - Department of Linguistics and Philology
Joakim Nivre - Uppsala University - Department of Linguistics and Philology
Eva Pettersson - Uppsala University - Department of Linguistics and Philology

Research principal:

Uppsala University
Opens a new window at ror.org.
ROR

Data contains personal data:

No

Citation:

Language:

Method and outcome

Data format/data structure:

Administrative information

Responsible department/unit:

Department of Linguistics and Philology

PURL:

https://web.archive.org/web/20161227013750/http://stp.lingfil.uu.se/~bea/turkiska/index.html

Topic and keywords

CESSDA topic classification:

Language and linguistics

Swedish Standard Classification of Research Subjects 2025:

Languages and literature

Keywords:

Relations

Homepage:

Link to description and demo of the corpus.

Publications

Citation:

Csató Johansson, Megyesi, Beáta, Dahlqvist, Bengt, Csató, Éva Á. & Nivre, Joakim, 'The English-Swedish-Turkish Parallel Treebank', Proceedings of Language Resources and Evaluation (LREC 2010)., 2010

Links:

URN:
urn:nbn:se:uu:diva-121758

Contact

Beáta Megyesibeata.megyesi@lingfil.uu.se

Metadata

Version 1

English-Swedish-Turkish Corpus

Citation and access

Data access level:

Creator/​Principal investigator(s):

Research principal:

Data contains personal data:

Citation:

Language:

Method and outcome

Data format/​data structure:

Administrative information

Responsible department/​unit:

PURL:

Topic and keywords

CESSDA topic classification:

Swedish Standard Classification of Research Subjects 2025:

Keywords:

Relations

Homepage:

Publications

Citation:

Links:

URN:

Contact

Metadata

Creator/Principal investigator(s):

Data format/data structure:

Responsible department/unit: