Publication
CIKM 2011
Conference paper

Statistical source expansion for question answering

View publication

Abstract

A source expansion algorithm automatically extends a given text corpus with related content from large external sources such as the Web. The expanded corpus is not intended for human consumption but can be used in question answering (QA) and other information retrieval or extraction tasks to find more relevant information and supporting evidence. We propose an algorithm that extends a corpus of seed documents with web content, using a statistical model to select text passages that are both relevant to the topics of the seeds and complement existing information. In an evaluation on 1,500 hand-labeled web pages, our algorithm ranked text passages by relevance with 81% MAP, compared to 43% when relying on web search engine ranks alone and 75% when using a multi-document summarization algorithm. Applied to QA, the proposed method yields consistent and significant performance gains. We evaluated the impact of source expansion on over 6,000 questions from the Jeopardy! quiz show and TREC evaluations using Watson, a state-of-the-art QA system. Accuracy increased from 66% to 71% on Jeopardy! questions and from 59% to 64% on TREC questions. © 2011 ACM.

Date

Publication

CIKM 2011