Ziv Bar-Yossef, T.S. Jayram, et al.
Journal of Computer and System Sciences
We develop an algorithmic framework to decompose a collection of time-stamped text documents into semantically coherent threads. Our formulation leads to a graph decomposition problem on directed acyclic graphs, for which we obtain three algorithms - an exact algorithm that is based on minimum cost flow and two more efficient algorithms based on maximum matching and dynamic programming that solve specific versions of the graph decomposition problem. Applications of our algorithms include superior summarization of news search results, improved browsing paradigms for large collections of text-intensive corpora, and integration of time-stamped documents from a variety of sources. Experimental results based on over 250,000 news articles from a major newspaper over a period of four years demonstrate that our algorithms efficiently identify robust threads of varying lengths and time-spans. Copyright 2005 ACM.
Ziv Bar-Yossef, T.S. Jayram, et al.
Journal of Computer and System Sciences
Ronald Fagin, Ravi Kumar, et al.
SIAM Journal on Discrete Mathematics
Ravi Kumar, Jasmine Novak, et al.
World Wide Web
David Gondek, Thomas Hofmann
KDD 2005