• Medientyp: Sonstige Veröffentlichung; Elektronischer Konferenzbericht
  • Titel: Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
  • Beteiligte: Hinneburg, Alexander [Verfasser:in]; Keim, Daniel A. [Verfasser:in]
  • Erschienen: KOPS - The Institutional Repository of the University of Konstanz, 1999
  • Sprache: Englisch
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach. ; published
  • Zugangsstatus: Freier Zugang
  • Rechte-/Nutzungshinweise: Namensnennung - Nicht-kommerziell - Keine Bearbeitung (CC BY-NC-ND)