Dynamic Document Clustering Using Singular Value Decomposition

Medientyp: E-Artikel

Titel: Dynamic Document Clustering Using Singular Value Decomposition

Beteiligte: Nadubeediramesh, Rashmi; Gangopadhyay, Aryya

Erschienen: IGI Global, 2012

Sprache: Ndonga

DOI: 10.4018/jcmam.2012070103

ISSN: 1947-3141; 1947-3133

Schlagwörter: Marketing ; Organizational Behavior and Human Resource Management ; Strategy and Management ; Drug Discovery ; Pharmaceutical Science ; Pharmacology

Entstehung:

Anmerkungen:

Beschreibung: <p>Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput.</p>

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: