• Medientyp: Bericht; E-Book
  • Titel: Scribosermo: fast speech-to-text models for German and other languages
  • Beteiligte: Bermuth, Daniel [VerfasserIn]; Poeppel, Alexander [VerfasserIn]; Reif, Wolfgang [VerfasserIn]
  • Erschienen: Augsburg University Publication Server (OPUS), 2022-11-10
  • Sprache: Englisch
  • DOI: https://doi.org/10.48550/arXiv.2110.07982
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.
  • Zugangsstatus: Freier Zugang