• Media type: Doctoral Thesis; Electronic Thesis; E-Book
  • Title: Temporal Segmentation of Human Actions in Videos
  • Contributor: Richard, Alexander [Author]
  • Published: Universitäts- und Landesbibliothek Bonn, 2019-10-29
  • Language: English
  • DOI: https://doi.org/20.500.11811/8085
  • Keywords: Viterbi decoding ; action detection ; weakly supervised learning ; action segmentation
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: Understanding human actions in videos is of great interest in various scenarios ranging from surveillance over quality control in production processes to content-based video search. Algorithms for automatic temporal action segmentation need to overcome severe difficulties in order to be reliable and provide sufficiently good quality. Not only can human actions occur in different scenes and surroundings, the definition on an action itself is also inherently fuzzy, leading to a significant amount of inter-class variations. Moreover, besides finding the correct action label for a pre-defined temporal segment in a video, localizing an action in the first place is anything but trivial. Different actions not only vary in their appearance and duration but also can have long-range temporal dependencies that span over the complete video. Further, getting reliable annotations of large amounts of video data is time consuming and expensive. The goal of this thesis is to advance current approaches to temporal action segmentation. We therefore propose a generic framework that models the three components of the task explicitly, ie long-range temporal dependencies are handled by a context model, variations in segment durations are represented by a length model, and short-term appearance and motion of actions are addressed with a visual model. While the inspiration for the context model mainly comes from word sequence models in natural language processing, the visual model builds upon recent advances in the classification of pre-segmented action clips. Considering that long-range temporal context is crucial, we avoid local segmentation decisions and find the globally optimal temporal segmentation of a video under the explicit models. Throughout the thesis, we provide explicit formulations and training strategies for the proposed generic action segmentation framework under different supervision conditions. First, we address the task of fully supervised temporal action segmentation, where frame-level annotations are available ...
  • Access State: Open Access
  • Rights information: In Copyright