Learning to predict readability using diverse linguistic features

Rohit J. Kate; Xiaoqiang Luo; Siddharth Patwardhan; Martin Franz; Radu Florian; Raymond J. Mooney; Salim Roukos; Chris Welty

COLING 2010

Conference paper

01 Dec 2010

Learning to predict readability using diverse linguistic features

Abstract

In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability.

Conference paper