Sparse representations for text categorization

Tara N. Sainath; Sameer Maskey; Dimitri Kanevsky; Bhuvana Ramabhadran; David Nahamoo; Julia Hirschberg

INTERSPEECH 2010

Conference paper

26 Sep 2010

Sparse representations for text categorization

Abstract

Sparse representations (SRs) are often used to characterize a test signal using few support training examples, and allow the number of supports to be adapted to the specific signal being categorized. Given the good performance of SRs compared to other classifiers for both image classification and phonetic classification, in this paper, we extended the use of SRs for text classification, a method which has thus far not been explored for this domain. Specifically, we demonstrate how sparse representations can be used for text classification and how their performance varies with the vocabulary size of the documents. In addition, we also show that this method offers promising results over the Naive Bayes (NB) classifier, a standard baseline classifier used for text categorization, thus introducing an alternative class of methods for text categorization. © 2010 ISCA.

Paper