Word document density and relevance scoring

Martin Franz; S. McCarley

SIGIR Forum (ACM Special Interest Group on Information Retrieval)

Paper

11 Dec 2000

Word document density and relevance scoring

Abstract

Previous work addressing the issue of word distribution in documents has shown the importance of word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks show useful performance improvements.

Conference paper