The Arabic E-Book Corpus
https://doi.org/10.5878/7rbh-gy93
The Arabic E-Book Corpus is a freely available collection of 1,745 books (81.5 million words) published in by the Hindawi foundation between 2008 and 2024. The books are of various genres, including non-fiction, novels, children's literature, poetry, and plays. The corpus is provided in two versions: html and unformatted plain text. The latter version will be appropriate for most purposes. For additional detail, see Hallberg, A. (2025). An 81-million-word multi-genre corpus of Arabic books. Data in Brief, 60, 111456. https://doi.org/10.1016/j.dib.2025.111456Opens in a new tab
Data files
Data files
Documentation files
Documentation files
Citation and access
Citation and access
Corpus
Corpus
Method and outcome
Method and outcome
Geographic coverage
Geographic coverage
Administrative information
Administrative information
Topic and keywords
Topic and keywords
Relations
Relations
Publications
Publications
Metadata
Metadata
Version 1

University of Gothenburg