Abstract
We previously discussed how classifiers based on logistic regression and decision trees can be used for predicting the class of an observation. Unfortunately, when such classifiers are trained on a dataset in which one of the response classes is rare, they can underestimate the probability of observing a rare event — the greater the imbalance, the greater this small-sample bias. This month, we illustrate how to mitigate the negative effect of class imbalance on the training of classifiers.