STORY SEGMENTATION AND TOPIC DETECTION FOR RECOGNIZED SPEECH
Abstract
In this paper we present algorithms for story segmentation, topic detection, and topic tracking. The algorithms use a combination of machine learning, statistical natural language processing and information retrieval techniques. The story segmentation algorithm is a two stage algorithm that uses a decision tree based probabilistic model in the first stage and incorporates aspects of our topic detection system via an information-retrieval based refinement scheme in the second stage. The topic detection and tracking algorithm is an incremental clustering algorithm that employs a novel dynamic cluster-dependent similarity measure between documents and clusters. Performance of these algorithms are measured on the 1998 DARPA sponsored Topic Detection and Tracking Phase 2 (TDT2) evaluation task.