On setting the hyper-parameters of term frequency normalization for information retrieval

Media type: E-Article
Title: On setting the hyper-parameters of term frequency normalization for information retrieval
Contributor: He, Ben; Ounis, Iadh
Published: Association for Computing Machinery (ACM), 2007
Published in: ACM Transactions on Information Systems, 25 (2007) 3, Seite 13
Language: English
DOI: 10.1145/1247715.1247719
ISSN: 1046-8188; 1558-2868
Keywords: Computer Science Applications ; General Business, Management and Accounting ; Information Systems
Origination:
Footnote:
Description: The setting of the term frequency normalization hyper-parameter suffers from the query dependence and collection dependence problems, which remarkably hurt the robustness of the retrieval performance. Our study in this article investigates three term frequency normalization methods, namely normalization 2, BM25's normalization and the Dirichlet Priors normalization. We tackle the query dependence problem by modifying the query term weight using a Divergence From Randomness term weighting model, and tackle the collection dependence problem by measuring the correlation of the normalized term frequency with the document length. Our research hypotheses for the two problems, as well as an automatic hyper-parameter setting methodology, are extensively validated and evaluated on four Text REtrieval Conference (TREC) collections.

Search in field: