• Media type: E-Article; Text
  • Title: Retrieval, Crawling and Fusion of Entity-centric Data on the Web
  • Contributor: Dietze, Stefan [Author]; Calì, Andrea [Author]; Gorgan, Dorian [Author]; Ugarte, Martín [Author]
  • Published: Heidelberg : Springer Verlag, 2017
  • Published in: Semantic keyword-based search on structured data sources ; Lecture notes in computer science ; 10151
  • Issue: accepted Version
  • Language: English
  • DOI: https://doi.org/10.15488/1258; https://doi.org/10.1007/978-3-319-53640-8_1
  • ISSN: 0302-9743
  • Keywords: Konferenzschrift ; Semantics ; Web crawler ; Semantic Web ; Schema.org ; Dataset recommendation ; Knowledge based systems ; Knowledge graphs ; Data fusion ; Entity retrieval ; Arches ; Markup
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: While the Web of (entity-centric) data has seen tremendous growth over the past years, take-up and re-use is still limited. Data vary heavily with respect to their scale, quality, coverage or dynamics, what poses challenges for tasks such as entity retrieval or search. This chapter provides an overview of approaches to deal with the increasing heterogeneity of Web data. On the one hand, recommendation, linking, profiling and retrieval can provide efficient means to enable discovery and search of entity-centric data, specifically when dealing with traditional knowledge graphs and linked data. On the other hand, embedded markup such as Microdata and RDFa has emerged a novel, Web-scale source of entitycentric knowledge. While markup has seen increasing adoption over the last few years, driven by initiatives such as schema.org, it constitutes an increasingly important source of entity-centric data on the Web, being in the same order of magnitude as the Web itself with regards to dynamics and scale. To this end, markup data lends itself as a data source for aiding tasks such as knowledge base augmentation, where data fusion techniques are required to address the inherent characteristics of markup data, such as its redundancy, heterogeneity and lack of links. Future directions are concerned with the exploitation of the complementary nature of markup data and traditional knowledge graphs. The final publication is available at Springer via http://dx.doi.org/ 10.1007/978-3-319-53640-8_1.
  • Access State: Open Access