Fact-based question decomposition in DeepQA
Abstract
Factoid questions often contain more than one fact or assertion about their answers. Question-answering (QA) systems, however, typically do not use such fine-grained distinctions because of the need for deep understanding of the question in order to identify and separate the facts. We argue that decomposing complex factoid questions is beneficial to QA systems, because the more facts that support an answer candidate, the more likely it is to be the correct answer. We broadly categorize decomposable questions into two types: parallel and nested. Parallel decomposable questions contain subquestions that can be evaluated independent of each other. Nested questions require decompositions to be processed in sequence, with the answer to an "inner" subquestion plugged into an "outer" subquestion. In this paper, we present a novel question decomposition framework capable of handling both decomposition types, built on top of the base IBM Watson™ QA system for Jeopardy!™. The framework contains a suite of decomposition rules that use predominantly lexico-syntactic features to identify facts within complex questions. It also contains a question-rewriting component and a candidate re-ranker, which uses machine learning and heuristic selection strategies to generate a final ranked answer list, taking into account answer confidences from the base QA system. We apply our decomposition framework to the particularly challenging domain of Final Jeopardy! questions, which are found to be difficult even for qualified Jeopardy! players, and we show a statistically significant improvement in the performance of our baseline QA system. © 1957-2012 IBM.