• Media type: Doctoral Thesis; E-Book; Electronic Thesis
  • Title: Automatic population of knowledge bases with multimodal data about named entities
  • Contributor: Taneva, Bilyana [Author]
  • imprint: Scientific publications of the Saarland University (UdS), 2013
  • Language: English
  • DOI: https://doi.org/10.22028/D291-26530
  • Keywords: multimodal data ; Multimedia ; Wissensextraktion ; Wissensbasis ; information extraction ; knowledge bases ; Wissensbasen
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: Knowledge bases are of great importance for Web search, recommendations, and many Information Retrieval tasks. However, maintaining them for not so popular entities is often a bottleneck. Typically, such entities have limited textual coverage and only a few ontological facts. Moreover, these entities are not well populated with multimodal data, such as images, videos, or audio recordings. The goals in this thesis are (1) to populate a given knowledge base with multimodal data about entities, such as images or audio recordings, and (2) to ease the task of maintaining and expanding the textual knowledge about a given entity, by recommending valuable text excerpts to the contributors of knowledge bases. The thesis makes three main contributions. The first two contributions concentrate on finding images of named entities with high precision, high recall, and high visual diversity. Our main focus are less popular entities, for which the image search engines fail to retrieve good results. Our methods utilize background knowledge about the entity, such as ontological facts or a short description, and a visual-based image similarity to rank and diversify a set of candidate images. Our third contribution is an approach for extracting text contents related to a given entity. It leverages a language-model-based similarity between a short description of the entity and the text sources, and solves a budget-constraint optimization program without any assumptions on the text structure. Moreover, our approach is also able to reliably extract entity related audio excerpts from news podcasts. We derive the time boundaries from the usually very noisy audio transcriptions. ; Wissensbasen wird bei der Websuche, bei Empfehlungsdiensten und vielen anderen Information Retrieval Aufgaben eine große Bedeutung zugeschrieben. Allerdings stellt sich deren Unterhalt für weniger populäre Entitäten als schwierig heraus. Üblicherweise ist die Anzahl an Texten über Entitäten dieser Art begrenzt, und es gibt nur wenige ontologische Fakten. ...
  • Access State: Open Access