A clustering algorithm for asymmetrically related data with applications to text mining
Abstract
Clustering techniques find a collection of subsets of a data set such that the collection satisfies a criterion that is dependent on a relation defined on the data set. The underlying relation is traditionally assumed to be symmetric. However, there exist many practical scenarios where the underlying relation is asymmetric. One example of an asymmetric relation in text analysis is the inclusion relation, i.e., the inclusion of the meaning of a block of text in the meaning of another block. In this paper, we consider the general problem of clustering of asymmetrically related data and propose an algorithm to cluster such data. To demonstrate its usefulness, we consider two applications in text mining: (1) summarization of short documents, and (2) generation of a concept hierarchy from a set of documents. Our experiments show that the performance of the proposed algorithm is superior to that of more traditional algorithms.