Home >The Catalogue>Datasets> DIH TECHNICOM, Slovak Web Discussion Corpus
DATASETS

Slovak Web Discussion Corpus

LICENCE
Creative Commons Attribution License 3.0
DOMAIN
Science and Technology
COVERAGE
Slovak Republic
FORMATS
TXT
PERSONAL DATA PROTECTION
No personal data
* Please note that the classification is taken from the original source
The corpus includes a complete set of web discussions about various topics from a single site. Each discussion is marked with a topic and talking person and is assigned to a specific section. The corpus includes an index for easy searching using regular expressions. Text of the discussions is processed with our tools for word tokenization, sentence boundary detection and morphological analysis. Token annotations include a correct word, proposed by a statistical correction system.
Disclaimer: This data is provided by a third party. The DIH identifying this data has no responsibility for its content. Please check the provided link to the data for license terms and potential usage restrictions. In case personal data is included in the dataset, the third party who provides the dataset is the data controller of such personal data. Please note that if you use the datasets for your own purposes, you become an independent data controller and are solely responsible for your compliance with relevant data protection laws relating to the processing and security of personal data, with particular reference, but not limited to, the provisions of the General Data Protection Regulation (GDPR), as applicable to the personal data included in the data.

DATA IDENTIFIED/OFFERED BY

MEMBER
DIH TECHNICOM
TYPE
DIH
COUNTRY
Slovakia

MORE INFORMATION ABOUT THIS DATASET