The corpus includes various Balkan Slavic texts from the 15th-20th century. The choice aims to cover all major dialect areas and schools of literature. The annotated section includes 20 shorter texts with full lemmatization, morphological and syntactic annotation, and translation. These files were published at Clarin.si as a corpus in tabular text and CoNLL-U formats (ca. 33k tokens). A frequently updated version in original Excel format (now ca. 48k tokens) is available at Switch. The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens).
The corpus includes various Balkan Slavic texts from the 15th-20th century. The choice aims to cover all major dialect areas and schools of literature. The annotated section includes 23 shorter texts with full lemmatization, morphological and syntactic annotation, and translation. These files were published at Clarin.si as a corpus in tabular text and CoNLL-U formats (ca. 53k tokens). A frequently updated version in original Excel format is available at Switch. The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens).
The corpus contains multiple versions of the Life of St.Petka by Patriarch Euthymius (†1393). Four of these were published online as a browser-capable interactive text, together with facsimiles and a collated view with more editions.
...
...
@@ -8,7 +8,7 @@ Keywords: Damaskini, Church Slavonic, Early Modern Bulgarian
## Repository location
annotated texts in .tsv/.conllu format - [Clarin.si](https://www.clarin.si/repository/xmlui/handle/11356/1368)\
annotated texts in .tsv/.conllu format - [Clarin.si](https://www.clarin.si/repository/xmlui/handle/11356/1441)\
annotated texts in .xlsx format - [SWITCHdrive](https://drive.switch.ch/index.php/s/5FyOXMTHYBn4OZa)\
raw texts - [SWITCHdrive](https://drive.switch.ch/index.php/s/mMhOq1hNwF4ymxL)