Home >The Catalogue>Datasets> DIH TECHNICOM, TEDxSK and JumpSK Lecture Speech Corpus
DATASETS

TEDxSK and JumpSK Lecture Speech Corpus

LICENCE
Creative Commons Attribution License 3.0
DOMAIN
Science and Technology
COVERAGE
Slovak Republic
FORMATS
STM, WAV
PERSONAL DATA PROTECTION
Personal data: Data made publicly available by data subjects
* Please note that the classification is taken from the original source
TEDxSK and JumpSK is a new Slovak spoken language resource built from TEDx and Jump Slovensko lectures. The presented speech corpus consists of 220 lectures in total duration of 58 hours. Annotated speech corpus was generated automatically, in an unsupervised manner, by using acoustic speech segmentation based on a principal component analysis and automatic speech transcription using two complementary speech recognition systems. For evaluation of quality of automatic transcription of speech, an evaluation set composed of 50 lectures, in total duration of 12 hours with manual transcription, has been created.
Disclaimer: This data is provided by a third party. The DIH identifying this data has no responsibility for its content. Please check the provided link to the data for license terms and potential usage restrictions. In case personal data is included in the dataset, the third party who provides the dataset is the data controller of such personal data. Please note that if you use the datasets for your own purposes, you become an independent data controller and are solely responsible for your compliance with relevant data protection laws relating to the processing and security of personal data, with particular reference, but not limited to, the provisions of the General Data Protection Regulation (GDPR), as applicable to the personal data included in the data.

DATA IDENTIFIED/OFFERED BY

MEMBER
DIH TECHNICOM
TYPE
DIH
COUNTRY
Slovakia

MORE INFORMATION ABOUT THIS DATASET