Erschienen:
Springer Science and Business Media LLC, 2022
Erschienen in:
The VLDB Journal, 31 (2022) 4, Seite 797-821
Sprache:
Englisch
DOI:
10.1007/s00778-022-00729-1
ISSN:
0949-877X;
1066-8888
Entstehung:
Anmerkungen:
Beschreibung:
AbstractIn many fields, e.g., data mining and machine learning, distance-based outlier detection (DOD) is widely employed to remove noises and find abnormal phenomena, because DOD is unsupervised, can be employed in any metric spaces, and does not have any assumptions of data distributions. Nowadays, data mining and machine learning applications face the challenge of dealing with large datasets, which requires efficient DOD algorithms. We address the DOD problem with two different definitions. Our new idea, which solves the problems, is to exploit an in-memory proximity graph. For each problem, we propose a new algorithm that exploits a proximity graph and analyze an appropriate type of proximity graph for the algorithm. Our empirical study using real datasets confirms that our DOD algorithms are significantly faster than state-of-the-art ones.