• Medientyp: E-Artikel
  • Titel: Dynamic Document Clustering Using Singular Value Decomposition
  • Beteiligte: Nadubeediramesh, Rashmi; Gangopadhyay, Aryya
  • Erschienen: IGI Global, 2012
  • Erschienen in: International Journal of Computational Models and Algorithms in Medicine
  • Sprache: Ndonga
  • DOI: 10.4018/jcmam.2012070103
  • ISSN: 1947-3141; 1947-3133
  • Schlagwörter: Marketing ; Organizational Behavior and Human Resource Management ; Strategy and Management ; Drug Discovery ; Pharmaceutical Science ; Pharmacology
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <p>Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput.</p>