xhosa Corpus of spoken isiXhosa doi-10-23695-xrsg-mp07-0 https://doi.org/10.23695/XRSG-MP07 Swedish National Data Service Svensk nationell datatjänst Landing page xhosa Corpus of spoken isiXhosa doi-10-23695-xrsg-mp07-0 https://doi.org/10.23695/XRSG-MP07 Språkbanken Text Swedish National Data Service Svensk nationell datatjänst Landing page The Corpus of Spoken isiXhosa The Corpus of Spoken isiXhosa consists of transcribed and annotated recordings of spoken Xhosa [xho]. The recordings have been made in the Eastern Cape in South Africa from 2015 onwards. The transcribed texts are annotated with morpheme-by-morpheme glosses, part-of-speech tags, and free English translations. The recordings and the annotations of Xhosa data have been made as part of three different research projects led by senior lecturer Eva-Marie Bloom Ström at the University of Gothenburg. All projects, including the ongoing ‘How do words get in order? The role of speaker-hearer interaction in languages of southern Africa’, were founded by the Swedish Research Council. The Corpus has been developed in collaboration with Språkbanken Text. A user guide and more extensive information about the corpus data can be found in the Corpus of Spoken isiXhosa Manual [PDF]. For more on annotation, preparation of data, and acknowledgements see: Bloom Ström, E.-M., Slater, O., Zahran, A., Berdicevskis, A., & Schumacher, A. (2023). Preparing a corpus of spoken Xhosa. Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), 62–67. https://aclanthology.org/2023.clasp-1.7 For questions about the corpus: Eva-Marie Bloom Ström eva-marie.strom@gu.se If you notice any errors or inconsistencies in annotations, please report them to this email address. Main contributors: Eva-Marie Bloom Ström Senior Lecturer, University of Gothenburg Onelisa Slater MA, Rhodes University Aron Zahran PhD, Inalco/Llacan (CNRS) & Ghent University The Corpus of Spoken isiXhosa The Corpus of Spoken isiXhosa consists of transcribed and annotated recordings of spoken Xhosa [xho]. The recordings have been made in the Eastern Cape in South Africa from 2015 onwards. The transcribed texts are annotated with morpheme-by-morpheme glosses, part-of-speech tags, and free English translations. The recordings and the annotations of Xhosa data have been made as part of three different research projects led by senior lecturer Eva-Marie Bloom Ström at the University of Gothenburg. All projects, including the ongoing ‘How do words get in order? The role of speaker-hearer interaction in languages of southern Africa’, were founded by the Swedish Research Council. The Corpus has been developed in collaboration with Språkbanken Text. A user guide and more extensive information about the corpus data can be found in the Corpus of Spoken isiXhosa Manual [PDF]. For more on annotation, preparation of data, and acknowledgements see: Bloom Ström, E.-M., Slater, O., Zahran, A., Berdicevskis, A., & Schumacher, A. (2023). Preparing a corpus of spoken Xhosa. Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), 62–67. https://aclanthology.org/2023.clasp-1.7 For questions about the corpus: Eva-Marie Bloom Ström eva-marie.strom@gu.se If you notice any errors or inconsistencies in annotations, please report them to this email address. Main contributors: Eva-Marie Bloom Ström Senior Lecturer, University of Gothenburg Onelisa Slater MA, Rhodes University Aron Zahran PhD, Inalco/Llacan (CNRS) & Ghent University Access to data through an external actor. Åtkomst till data via extern aktör.