HyperSum: Hypergraph based semi-supervised sentence ranking for query-oriented summarization
Abstract
Graph based sentence ranking algorithms such as PageRank and HITS have been successfully used in query-oriented summarization. With these algorithms, the documents to be summarized are often modeled as a text graph where nodes represent sentences and edges represent pairwise similarity relationships between two sentences. A deficiency of conventional graph modeling is its incapability of naturally and effectively representing complex group relationships shared among multiple objects. Simply squeezing complex relationships into pairwise ones will inevitably lead to loss of information which can be useful for ranking and learning. In this paper, we propose to take advantage of hypergraph, i.e. a generalization of graph, to remedy this defect. In a text hypergraph, nodes still represent sentences, yet hyperedges are allowed to connect more than two sentences. With a text hypergraph, we are thus able to integrate both group relationships formulated among multiple sentences and pairwise relationships formulated between two sentences in a unified framework. As essential work, it is first addressed in the paper that how a text hypergraph can be built for summarization by applying clustering techniques. Then, a hypergraph based semi-supervised sentence ranking algorithm is developed for query-oriented extractive summarization, where the influence of query is propagated to sentences through the structure of the constructed text hypergraph. When evaluated on DUC data sets, performance of the proposed approach is remarkable. Copyright 2009 ACM.