• Media type: Doctoral Thesis; Electronic Thesis; E-Book
  • Title: Robust Bidirectional Processing for Speech-controlled Robotic Scenarios ; Robuste Bidirektionale Verarbeitung für sprachgesteuerte Robotikszenarien
  • Contributor: Twiefel, Johannes [Author]
  • imprint: Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2020-01-01
  • Language: English
  • Keywords: Sprachverarbeitung ; Automatic Speech Recognition ; Natural Language Processing ; Maschinelles Lernen ; 54.72 Künstliche Intelligenz ; 54.75 Sprachverarbeitung ; Künstliche Intelligenz ; Natürliche Sprachverarbeitung ; Automatische Spracherkennung ; Robotik
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: Automatic Speech Recognition (ASR) is often employed for applications like dictation, where the aim is to cover a broad range of vocabularies. Also, ASR is a central interface for humans to communicate or control a system. Those systems can perform a fixed set of actions and follow a well-defined goal. Audio is recorded using a microphone, the ASR system produces text hypotheses, and a natural language processing (NLP) system derives machine-readable representations from text. These representations are afterwards employed to instruct the system to perform a defined action to achieve a goal. At a first glance, this approach of orchestrating a unidirectional processing pipeline appears to be reasonable and is often followed in practice. In this thesis, we demonstrate, that there are better approaches to address this kind of tasks and present a more suitable one. A well-known issue of ASR systems is that a growing vocabulary of words that could be recognized by the system leads to a higher word error rate (WER). For applications like dictation, this issue is hard to address, but for the before-mentioned problem of controlling a system, we are able to address it. Usually, the number of goals and possible actions of the system is limited; the possible text instructions are also limited. This leads to a smaller vocabulary, which improves the performance of the ASR system. Another limitation of the unidirectional processing chain approach is the assumption of NLP systems to receive correct text input. Although these systems are trained on (clean) text, it is still a challenge to recognize a correct natural language representation from it. As the processed text is produced by an ASR system, it is possibly incorrect, making it hard for the NLP system to recognize the correct meaning from incorrect text. If afterwards a spoken command cannot be executed by the system, it is rejected, and the user needs to repeat the instruction. In this thesis, we present a self-trained ASR system that performs better than Google’s ...
  • Access State: Open Access