AUSUM: Approach for unsupervised bug report summarization
Abstract
In most software projects, resolved bugs are archived for future reference. These bug reports contain valuable information on the reported problem, investigation and resolution. When bug triaging, developers look for how similar problems were resolved in the past. Search over bug repository gives the developer a set of recommended bugs to look into. However, the developer still needs to manually peruse the contents of the recommended bugs which might vary in size from a couple of lines to thousands. Automatic summarization of bug reports is one way to reduce the amount of data a developer might need to go through. Prior work has presented learning based approaches for bug summarization. These approaches have the disadvantage of requiring large training set and being biased towards the data on which the model was learnt. In fact, maximum efficacy was reported when the model was trained and tested on bug reports from the same project. In this paper, we present the results of applying four unsupervised summarization techniques for bug summarization. Industrial bug reports typically contain a large amount of noise - email dump, chat transcripts, core-dump - useless sentences from the perspective of summarization. These derail the unsupervised approaches, which are optimized to work on more well-formed documents. We present an approach for noise reduction, which helps to improve the precision of summarization over the base technique (4% to 24% across subjects and base techniques). Importantly, by applying noise reduction, two of the unsupervised techniques became scalable for large sized bug reports. © 2012 ACM.