arrow_back Translation Augmented LibriSpeech Corpus
General information
Contributor: Laurent Besacier
Other contributors: Laurent Arnaud
Institution: Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG
Description: Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text. Speech recordings and source texts are originally from Gutenberg Project, which is a digital library of public domain books read by volunteers. Our augmentation of LibriSpeech is straightforward: we automatically aligned e-books in a foreign language (French) with English utterances of LibriSpeech. We gathered open domain e-books in French and extracted individual chapters available in LibriSpeech Corpus. Furthermore, we aligned chapters in French with English utterances in order to provide a corpus of speech recordings aligned with their translations.
Readme file
Archive files 8683 downloads
inventory_2 DS91.zip
- ├ inventory_2 train_100h_txt.zip
- ├ inventory_2 audio_files.zip
- ├ inventory_2 train_100h.zip
- ├ inventory_2 dev.zip
- ├ inventory_2 alignments.zip
- ├ inventory_2 train130h_additional_txt.zip
- ├ inventory_2 dev_txt.zip
- ├ inventory_2 database.zip
- ├ inventory_2 test.zip
- ├ inventory_2 Interface.zip
- ├ inventory_2 test_txt.zip
- └ inventory_2 train_130h_additional.zip
info You need to login to download this dataset.
Details
External identifier:
Unavailable
Subjects:
Computer science,
Linguistics
Keywords:
machine translation,
Corpus Linguistics,
Linguistics,
speech translation,
libris speech,
natural language processing,
Alignments,
corpus,
Multimodal Corpus,
audiobooks
Encoding format:
sqlite3-wav-txt-pdf
Citation
Ali Can Kocabiyikoglu, Laurent Besacier, Olivier Kraif. (2017). Translation Augmented LibriSpeech Corpus. Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG. [data set] published 2017 via PerSciDO_Grenoble_Alpes.. Published 2017 via Perscido-Grenoble-Alpes