Skip to content
Snippets Groups Projects
user avatar
Anastasia Escher authored
973e9aa7
History
Name Last commit Last update
README.md

Macedonian Dialect Corpus

Macedonian spoken corpus comprises transcriptions of audio files collected in a series of field research trips in Prespa, Bitola and Debar regions in 2012, 2014, 2016 and 2019.

The most texts were gathered in semi-directed interviews with a questionnaire containing questions on local traditions (weddings, celebration of calendar holidays etc.), local mythology and folklore, as well as questions evoking biographical stories of the informants.

Some of the informants are bilingual (Aromanian-Macedonian or Albanian-Macedonian) and are part of different religious communities (Islam, Orthodoxy). In some cases, the speech of the informants could be characterized dialectal, while in others it is rather regional standard, i. e. standard Macedonian with several dialectal features. Besides, the corpus contains several texts published earlier by Macedonian dialectologists.

The informants are anonymized.

The corpus contains part of speech annotation, minimal morphological annotation (for noun gender and verb aspect) and lemmatization. The corpus is currently available as a continuously expanding collection of files in XML format (on demand to anastasia.escher@uzh.ch).

See a meta-data demo at https://anastasia-escher.shinyapps.io/mac_corpus/