LICENCE
Creative Commons Attribution License 3.0
DOMAIN
Science and Technology
COVERAGE
Slovak Republic
FORMATS
TXT
PERSONAL DATA PROTECTION
No personal data
* Please note that the classification is taken from the original source
This corpus is an attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing. It contains a selection of news articles, processed by our NLP tools. The corpus consists of two parts. The first part contains text files and annotations: -Token boundary identification -Sentence boundary identification -Stop-Words -Morphological Analysis -Named Entity Recognition -Named Entity Transcription -Lemma The second part contains an evaluation for information retrieval.