• Medientyp: E-Book
  • Titel: Hybrid Intelligent Techniques for Text Categorization
  • Beteiligte: Sadiq, Dr. Ahmed [VerfasserIn]; Abdullah, Sura [VerfasserIn]
  • Erschienen: [S.l.]: SSRN, 2014
  • Umfang: 1 Online-Ressource (18 p)
  • Sprache: Englisch
  • Entstehung:
  • Anmerkungen: In: International Journal of Advanced Computer Science and Information Technology (IJACSIT) Vol. 2, No. 2, April 2013, Page: 23-40
    Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments January 8, 2014 erstellt
  • Beschreibung: Text categorization is the task in which text documents are classified into one or more of predefined categories based on their contents. This paper shows that the proposed system consists of three main steps: text document representation, classifier construction and performance evaluation. In the first step, a set of pre-classified text documents is provided. Each text document is initially preprocessed in order to be split into features, these features are weighted based on the frequency of each feature in that text document and eliminate the non-informative features. The remaining features are next standardized by reducing a feature to its root using the stemming process. Due to the large number of features even after the non-informative features removal and the stemming process, the proposed system applies specific thresholds to extract distinct features which represent that text document. In the second step, the text categorization model (classifier) is built by learning the distinct features which represent all the pre-classified text documents for each sub-category of main categories; this process can be achieved by using one of the supervised categorization techniques that is called the rough set theory. Thereafter, the model uses a pair of precise concepts from the above theory that are called the lower and upper approximations to classify any test text document into one or more of main categories and sub-categories. In the final step, the performance of the proposed system is evaluated. It has achieved good results up to 96%, when applied to a number of test text documents for each sub-category of main categories
  • Zugangsstatus: Freier Zugang