Robust speech recognition using articulatory information

Media type: Text; Doctoral Thesis; Electronic Thesis; E-Book

Title: Robust speech recognition using articulatory information

Contributor: Kirchhoff, Katrin [Author]

Published: noah.nrw, 1999

Language: English

Keywords: Automatische Klassifikation ; Akustisches Signal ; Robustheit ; Articulation ; Artikulation ; Automatische Spracherkennung ; Pattern recognition ; Speech recognition

Origination:

Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Description: Current automatic speech recognition systems make use of a single source of information about their input, viz. a preprocessed form of the acoustic speech signal, which encodes the time-frequency distribution of signal energy. The goal of this thesis is to investigate the benefits of integrating articulatory information into state-of-the art speech recognizers, either as a genuine alternative to standard acoustic representations, or as an additional source of information. Articulatory information is represented in terms of abstract articulatory classes or "features", which are extracted from the speech signal by means of statistical classifiers. A higher-level classifier then combines the scores for these features and maps them to standard subword unit probabilities. The main motivation for this approach is to improve the robustness of speech recognition systems in adverse acoustic environments, such as background noise. Typically, recognition systems show a sharp decline of performance under these conditions. We argue and demonstrate empirically that the articulatory feature approach can lead to greater robustness by enhancing the accuracy of the bottom-up acoustic modeling component in a speech recognition system. The second focus point of this thesis is to provide detailed analyses of the different types of information provided by the acoustic and the articulatory representations, respectively, and to develop strategies to optimally combine them. To this effect we investigate combination methods at the levels of feature extraction, subword unit probability estimation, and word recognition. The feasibility of this approach is demonstrated with respect to two different speech recognition tasks. The first of these is an American English corpus of telephone-bandwidth speech; the recognition domain is continuous numbers. The second is a German database of studio-quality speech consisting of spontaneous dialogues. In both cases recognition performance will be tested not only under clean acoustic conditions but also ...

Access State: Open Access

Search in field:

Recently searched for: