Skip to main content
Researchdata.se

MARB

https://doi.org/10.23695/V3WP-6C64
Reporting bias (the human tendency to not mention obvious or redundant information) and social bias (societal attitudes toward specific demographic groups) have both been shown to propagate from human text data to language models trained on such data. However, the two phenomena have not previously been studied in combination. The MARB dataset was developed to begin to fill this gap by studying the interaction between social biases and reporting bias in language models. Unlike many existing benchmark datasets, MARB does not rely on artificially constructed templates or crowdworkers to create contrasting examples. Instead, the templates used in MARB are based on naturally occurring written language from the 2021 version of the enTenTen corpus (Jakubíček et al., 2013).
Go to data source
Opens in a new tab
https://doi.org/10.23695/V3WP-6C64

Citation and access

Administrative information

Topic and keywords

Metadata

sprakbanken-textgu_en