Abstract
Sentiment has traditionally been considered a "deep" attribute of writing, often requiring the interpretation of figurative language to uncover the writer's intention. The natural language processing community has become increasingly interested in detecting, through automatic means, the expression of opinions and measuring the intensity of emotions held by the writer. Despite the depth and abstraction often associated with expressions of sentiment, we apply strictly lexical analysis to the opinions expressed about books and find that machine learning techniques are capable of resolving even fine-grained distinctions between opinions. Using an averaged perceptron classifier trained using a word subsequence kernel, we achieve an accuracy of 89% when distinguishing between 1- and 5-star reviews. Further, this same model yields significant separation when scoring intermediate reviews - making distinctions even human annotators find difficult. We detail the collection of data for supervised training and present the results of our sentiment classifier along with some discussion about why we believe this approach to be effective. © 2007 IEEE.