• Medientyp: Elektronische Hochschulschrift; Dissertation; E-Book
  • Titel: Robust Speech Recognition via Adaptation for German Oral History Interviews
  • Beteiligte: Gref, Michael [VerfasserIn]
  • Erschienen: Universitäts- und Landesbibliothek Bonn, 2022-10-20
  • Sprache: Englisch
  • DOI: https://doi.org/20.500.11811/10373; https://doi.org/10.1109/ICME.2019.00142
  • Schlagwörter: acoustic model adaptation ; oral history ; ASR ; transcription ; Zeitzeugeninterviews ; Domänenanpassung ; Transkription ; robust speech recognition ; automatische Spracherkennung ; automatic speech recognition ; robuste Spracherkennung ; domain adaptation ; akustisches Modell Anpassung
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: Automatic speech recognition systems often achieve remarkable performance when trained on thousands of hours of manually annotated and time-aligned speech. However, when applied in other conditions and domains than they were trained on, the systems' recognition quality often deteriorates, substantially limiting their real-world application. One of these applications is the automatic transcription of oral history interviews, i.e., interviews with witnesses of historical events. For the past twenty years, oral history interviews have been among the most challenging use cases for speech recognition due to a lack of representative training data, diverse and often poor recording conditions, and the spontaneous and occasionally colloquial nature of the speech. This thesis proposes and studies the combination of different domain adaptation approaches to overcome the lack of representative training data and cope with the unpredictability of oral history interviews. We employ and investigate data augmentation to adapt broadcast training data to cover the challenging recording conditions of oral history interviews. We compare data augmentation approaches to conventional speech enhancement. To improve the system's performance further, we study domain adaptation via fine-tuning to adapt the acoustic models trained robustly on thousands of hours of annotated speech using a minimal amount of manually transcribed oral history interviews. We employ automatic transcript-alignment to generate adaptation data from transcribed but not time-aligned interviews and investigate the influence of different adaptation data sizes on domain overfitting and generalization. We reduce domain overfitting and improve the generalization of the adapted models employing cross-lingual adaptation in a multi-staged setup to leverage the vast availability of English speech corpora. Additionally, in this thesis, a human word error rate for German oral history interviews recorded under clean conditions is experimentally estimated to study and highlight ...
  • Zugangsstatus: Freier Zugang
  • Rechte-/Nutzungshinweise: Namensnennung - Weitergabe unter gleichen Bedingungen (CC BY-SA)