update

fa0f8822 · Noah Bubenhofer · ad412dab · fa0f8822
Commit fa0f8822 authored 2 years ago by Noah Bubenhofer
--- a/corpora/arthistv2.md
+++ b/corpora/arthistv2.md
-# ArtHist v2
-
-## Grunddaten
-
-* URL: <https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/>
-* Name: Arthist Mailinglist V2 (2023-02) 
-* Sources: ArtHist.net Mailingliste https://arthist.net
-* Date Range: 2001 (January) to 2022 (December)
-* Creators: Niclas Bodenmann, Xenia Bojarski, Noah Bubenhofer, Daniel Burckhardt (Datencrawling)
-* Funding: Tristan Weddigen, Noah Bubenhofer
-* Usage Rights: CC BY-SA 3.0 
-
-## Corpus Metadata
-
-* https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y
-
-## Annotation
-
-<https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y>
-
-## Short Description
-
-The corpus contains all newsletters from arthist.net from 2001 to 2022. A automatic language detection on sentence level
-has been made.
-
-...
-
-
-## Publications
\ No newline at end of file
+# ArtHist v2



## Grunddaten



* URL: <https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/>

* Name: Arthist Mailinglist V2 (2023-02) 

* Sources: ArtHist.net Mailingliste https://arthist.net

* Date Range: 2001 (January) to 2022 (December)

* Creators: Niclas Bodenmann, Xenia Bojarski, Noah Bubenhofer, Daniel Burckhardt (Datencrawling)

* Funding: Tristan Weddigen, Noah Bubenhofer

* Usage Rights: CC BY-SA 3.0 



## Corpus Metadata



* https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y



## Annotation



<https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y>



## Short Description



The corpus contains all newsletters from arthist.net from 2001 to 2022. An automatic language detection on sentence level has been made.
This enables POS-tagging for the different languages. 
The tags are tagged with the Universal Tagset (UPOS). 
Named entities are tagged with different detail depending on the language which can result in different tags. 
The corpus consists of texts of the following languages and percentages: 
46 % English 
44 % German 
6 % French 
2 % Italian 
2 % Spanish
0.2 % Portugese

The corpus contains 20,183,463 tokens in 31,190 texts. 
Pictures are not included in this corpus.








## Publications
\ No newline at end of file