Conference paper

Sequence homology detection through large scale pattern discovery


We describe a new approach for identifying sequence similarity between a query sequence and a data base of proteins. The central idea is the use of a set of patterns obtained from the underlying data base through an one-time computation. These patterns are subsequently searched for on every query sequence presented to the system. A pattern matched by a region of the query pinpoints to a potential local similarity between that region and all the data base sequences also matching that pattern. By using a set of prudently chosen patterns, the tool presented in this work is able to discover weak but biologically important similarities.
