Skip to content
Snippets Groups Projects
Commit fa0f8822 authored by Noah Bubenhofer's avatar Noah Bubenhofer
Browse files

update

parent ad412dab
No related branches found
No related tags found
No related merge requests found
# ArtHist v2
## Grunddaten
* URL: <https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/>
* Name: Arthist Mailinglist V2 (2023-02)
* Sources: ArtHist.net Mailingliste https://arthist.net
* Date Range: 2001 (January) to 2022 (December)
* Creators: Niclas Bodenmann, Xenia Bojarski, Noah Bubenhofer, Daniel Burckhardt (Datencrawling)
* Funding: Tristan Weddigen, Noah Bubenhofer
* Usage Rights: CC BY-SA 3.0
## Corpus Metadata
* https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y
## Annotation
<https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y>
## Short Description
The corpus contains all newsletters from arthist.net from 2001 to 2022. A automatic language detection on sentence level
has been made.
...
## Publications
\ No newline at end of file
# ArtHist v2 ## Grunddaten * URL: <https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/> * Name: Arthist Mailinglist V2 (2023-02) * Sources: ArtHist.net Mailingliste https://arthist.net * Date Range: 2001 (January) to 2022 (December) * Creators: Niclas Bodenmann, Xenia Bojarski, Noah Bubenhofer, Daniel Burckhardt (Datencrawling) * Funding: Tristan Weddigen, Noah Bubenhofer * Usage Rights: CC BY-SA 3.0 ## Corpus Metadata * https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y ## Annotation <https://korpuspragmatik.ds.uzh.ch/korpora/arthistv2/index.php?thisQ=corpusMetadata&uT=y> ## Short Description The corpus contains all newsletters from arthist.net from 2001 to 2022. An automatic language detection on sentence level has been made. This enables POS-tagging for the different languages. The tags are tagged with the Universal Tagset (UPOS). Named entities are tagged with different detail depending on the language which can result in different tags. The corpus consists of texts of the following languages and percentages: 46 % English 44 % German 6 % French 2 % Italian 2 % Spanish 0.2 % Portugese The corpus contains 20,183,463 tokens in 31,190 texts. Pictures are not included in this corpus. ## Publications
\ No newline at end of file
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment