Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information
Abstract
Background: The rapid publication of important research in the biomedical literature makes it increasingly difficult for researchers to keep current with significant work in their area of interest. Results: This paper reports a scalable method for the discovery of protein-protein interactions in Medline abstracts, using a combination of text analytics, statistical and graphical analysis, and a set of easily implemented rules. Applying these techniques to 12,300 abstracts, a precision of 0.61 and a recall of 0.97 were obtained, (f = 0.74) and when allowing for two-hop and three-hop relations discovered by graphical analysis, the precision was 0.74 (f = 0.83). Conclusions: This combination of linguistic and statistical approaches appears to provide the highest precision and recall thus far reported in detecting protein-protein relations using text analytic approaches. © 2005 Cooper and Kershenbaum, licensee BioMed Central Ltd.