LICENCE
Creative Commons Attribution License 3.0
DOMAIN
Science and Technology
COVERAGE
Slovak Republic
FORMATS
STM, WAV
PERSONAL DATA PROTECTION
Personal data: Data made publicly available by data subjects
* Please note that the classification is taken from the original source
TEDxSK and JumpSK is a new Slovak spoken language resource built from TEDx and Jump Slovensko lectures. The presented speech corpus consists of 220 lectures in total duration of 58 hours. Annotated speech corpus was generated automatically, in an unsupervised manner, by using acoustic speech segmentation based on a principal component analysis and automatic speech transcription using two complementary speech recognition systems. For evaluation of quality of automatic transcription of speech, an evaluation set composed of 50 lectures, in total duration of 12 hours with manual transcription, has been created.