• Medientyp: Elektronische Hochschulschrift; E-Book; Dissertation
  • Titel: Efficient Frequent Subtree Mining Beyond Forests
  • Beteiligte: Welke, Pascal [VerfasserIn]
  • Erschienen: Universitäts- und Landesbibliothek Bonn, 2019-04-30
  • Sprache: Englisch
  • DOI: https://doi.org/20.500.11811/7893
  • Schlagwörter: computer science ; Data Mining ; Spannbaum ; graph theory ; Häufige Teilgraph Suche ; spanning tree ; frequent subgraph mining ; Graphentheorie ; subgraph isomorphism ; Informatik ; Subgraphisomorphismus
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: A common paradigm in distance-based learning is to embed the instance space into some appropriately chosen feature space equipped with a metric and to define the dissimilarity between instances by the distance of their images in the feature space. If the instances are graphs, then frequent connected subgraphs are a well-suited pattern language to define such feature spaces. Identifying the set of frequent connected subgraphs and subsequently computing embeddings for graph instances, however, is computationally intractable. As a result, existing frequent subgraph mining algorithms either restrict the structural complexity of the instance graphs or require exponential delay between the output of subsequent patterns. Hence distance-based learners lack an efficient way to operate on arbitrary graph data. To resolve this problem, in this thesis we present a mining system that gives up the demand on the completeness of the pattern set to instead guarantee a polynomial delay between subsequent patterns. Complementing this, we devise efficient methods to compute the embedding of arbitrary graphs into the Hamming space spanned by our pattern set. As a result, we present a system that allows to efficiently apply distance-based learning methods to arbitrary graph databases. To overcome the computational intractability of the mining step, we consider only frequent subtrees for arbitrary graph databases. This restriction alone, however, does not suffice to make the problem tractable. We reduce the mining problem from arbitrary graphs to forests by replacing each graph by a polynomially sized forest obtained from a random sample of its spanning trees. This results in an incomplete mining algorithm. However, we prove that the probability of missing a frequent subtree pattern is low. We show empirically that this is true in practice even for very small sized forests. As a result, our algorithm is able to mine frequent subtrees in a range of graph databases where state-of-the-art exact frequent subgraph mining systems fail to ...
  • Zugangsstatus: Freier Zugang
  • Rechte-/Nutzungshinweise: Urheberrechtsschutz