Anmerkungen:
Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
Beschreibung:
A scene graph is a graph structure in which nodes symbolize the entities in a scene and the edges indicate the relationships between the entities. It is viewed as a potential approach to access holistic scene understanding, as well as a promising tool to bridge the domains of vision and language. Despite their potential, the field lacks a comprehensive, systematic analysis of scene graphs and their practical applications. This dissertation fills this gap with significant contributions in both image-based and video-based scene graphs. For image-based scene graphs, a two-stage scene graph generation method with high performance is first proposed. The approach performs scene graph generation by solving a neural variant of ordinary differential equations. To further reduce the time complexity and inference time of two-stage approaches, image-based scene graph generation is formulated as a set prediction problem. A Transformer-based model is proposed to infer visual relationships without giving object proposals. During the study of image-based scene graph generation, we find that the existing evaluation metrics fail to demonstrate the overall semantic difference between a scene graph and an image. To overcome this limitation, we propose a contrastive learning framework which can measure the similarity between scene graphs and images. The framework can also be used as a scene graph encoder for further applications. For video-based scene graphs, a dynamic scene graph generation method based on Transformers is proposed to capture the spatial context and the temporal dependencies. This method has become a popular baseline model in this task. Moreover, to extend the video scene graph applications, a semantic scene graph-to-video synthesis framework is proposed that can synthesize a fixed-length video with an initial scene image and discrete semantic video scene graphs. The video and graph representations are modeled by a GPT-like Transformer using an auto-regressive prior. These methods have demonstrated state-of-the-art ...