LICENCE
Creative Commons Attribution License 3.0
DOMAIN
Science and Technology
COVERAGE
Slovak Republic
FORMATS
TXT
PERSONAL DATA PROTECTION
No personal data
* Please note that the classification is taken from the original source
The corpus includes a complete set of web discussions about various topics from a single site. Each discussion is marked with a topic and talking person and is assigned to a specific section. The corpus includes an index for easy searching using regular expressions. Text of the discussions is processed with our tools for word tokenization, sentence boundary detection and morphological analysis. Token annotations include a correct word, proposed by a statistical correction system.