Creative Commons Attribution License 3.0
Science and Technology
PERSONAL DATA PROTECTION
No personal data
* Please note that the classification is taken from the original source
This corpus is an attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing. It contains a selection of news articles, processed by our NLP tools. The corpus consists of two parts. The first part contains text files and annotations: -Token boundary identification -Sentence boundary identification -Stop-Words -Morphological Analysis -Named Entity Recognition -Named Entity Transcription -Lemma The second part contains an evaluation for information retrieval.
Disclaimer: This data is provided by a third party. The DIH identifying this data has no responsibility for its content. Please check the provided link to the data for license terms and potential usage restrictions. In case personal data is included in the dataset, the third party who provides the dataset is the data controller of such personal data. Please note that if you use the datasets for your own purposes, you become an independent data controller and are solely responsible for your compliance with relevant data protection laws relating to the processing and security of personal data, with particular reference, but not limited to, the provisions of the General Data Protection Regulation (GDPR), as applicable to the personal data included in the data.
MORE INFORMATION ABOUT THIS DATASET