Erschienen in:MIT Sloan Research Paper ; No. 4586-06
Umfang:
1 Online-Ressource (9 p)
Sprache:
Englisch
DOI:
10.2139/ssrn.882115
Identifikator:
Entstehung:
Anmerkungen:
Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments January 2006 erstellt
Beschreibung:
This paper develops theory and algorithms concerning a new metric for clustering data. The metric minimizes the total volume of clusters, where volume of a cluster is defined as the volume of the minimum volume ellipsoid (MVE) enclosing all data points in the cluster. This metric has the scale-invariant property, that is, the optimal clusters are invariant under an affine transformation of the data space. We introduce the concept of outliers in the new metric and show that the proposed method of treating outliers asymptotically recovers the data distribution when the data comes from a single multivariate Gaussian distribution. Two heuristical algorithm are presented that attempt to optimize the new metric. On a series of empirical studies on real and simulated data sets, we show that volume-based clustering out-performs the k-means algorithm