Skip to main content
Researchdata.se

MultiGEC

https://doi.org/10.23695/H9F5-8143

Dataset description MultiGEC is a dataset for Multilingual Grammatical Error Correction in 12 European languages (Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian) compiled by the CompSLA working group and over 20 external data providers in the context of MultiGEC-2025, the first text-level GEC shared task. The MultiGEC dataset is divided into 17 subcorpora covering different languages, domains and correction styles, summarized below. More detailed information about each subcorpus is available as machine-readable metadata, whose format is described .

Go to data source
Opens in a new tab
https://doi.org/10.23695/H9F5-8143

Citation and access

Administrative information

Topic and keywords

Relations

Metadata

sprakbanken-text
University of Gothenburg