arrow_back Translation Augmented LibriSpeech Corpus

15 Dec 2017 Open Speech data

General information

Contributor: Laurent Besacier

Other contributors: Laurent Arnaud

Institution: Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG

Description: Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text. Speech recordings and source texts are originally from Gutenberg Project, which is a digital library of public domain books read by volunteers. Our augmentation of LibriSpeech is straightforward: we automatically aligned e-books in a foreign language (French) with English utterances of LibriSpeech. We gathered open domain e-books in French and extracted individual chapters available in LibriSpeech Corpus. Furthermore, we aligned chapters in French with English utterances in order to provide a corpus of speech recordings aligned with their translations.

Archive files 8683 downloads

inventory_2  DS91.zip

  •   inventory_2 train_100h_txt.zip
  •   inventory_2 audio_files.zip
  •   inventory_2 train_100h.zip
  •   inventory_2 dev.zip
  •   inventory_2 alignments.zip
  •   inventory_2 train130h_additional_txt.zip
  •   inventory_2 dev_txt.zip
  •   inventory_2 database.zip
  •   inventory_2 test.zip
  •   inventory_2 Interface.zip
  •   inventory_2 test_txt.zip
  •   inventory_2 train_130h_additional.zip

info  You need to  login  to download this dataset.

Details

External identifier:
Unavailable

Subjects:
Computer science, Linguistics

Keywords:
machine translation, Corpus Linguistics, Linguistics, speech translation, libris speech, natural language processing, Alignments, corpus, Multimodal Corpus, audiobooks

Encoding format:
sqlite3-wav-txt-pdf

Citation

Ali Can Kocabiyikoglu, Laurent Besacier, Olivier Kraif. (2017). Translation Augmented LibriSpeech Corpus. Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG. [data set] published 2017 via PerSciDO_Grenoble_Alpes.. Published 2017 via Perscido-Grenoble-Alpes

content_copy Copy