• Media type: E-Article
  • Title: On setting the hyper-parameters of term frequency normalization for information retrieval
  • Contributor: He, Ben; Ounis, Iadh
  • Published: Association for Computing Machinery (ACM), 2007
  • Published in: ACM Transactions on Information Systems, 25 (2007) 3, Seite 13
  • Language: English
  • DOI: 10.1145/1247715.1247719
  • ISSN: 1046-8188; 1558-2868
  • Keywords: Computer Science Applications ; General Business, Management and Accounting ; Information Systems
  • Origination:
  • Footnote:
  • Description: The setting of the term frequency normalization hyper-parameter suffers from the query dependence and collection dependence problems, which remarkably hurt the robustness of the retrieval performance. Our study in this article investigates three term frequency normalization methods, namely normalization 2, BM25's normalization and the Dirichlet Priors normalization. We tackle the query dependence problem by modifying the query term weight using a Divergence From Randomness term weighting model, and tackle the collection dependence problem by measuring the correlation of the normalized term frequency with the document length. Our research hypotheses for the two problems, as well as an automatic hyper-parameter setting methodology, are extensively validated and evaluated on four Text REtrieval Conference (TREC) collections.