Retrieving information from full text using linguistic knowledge
Abstract
Recent trends in text searching, such as the availability of full-text databases and the introduction of natural-language queries, underline the role that linguistic analysis can play in information retrieval. In this paper we examine how techniques known and used in the field of natural-language processing can be applied to the analysis of text in information retrieval. State-of-the-art text searching programs cannot distinguish, for example, between occurrences of the sickness AIDS and aids as tools; or between library school and school library. On the other hand, they do not equate terms like online and on-line which are variants of the same form. To make these distinctions, systems must incorporate knowledge about the meaning of words in context. Research in natural-language processing in the last two or three decades has concentrated on the automatic `understanding' of language: how to analyze the grammatical structure and the semantic meaning of text. Although many aspects of this research remain experimental, we describe how we apply some of these techniques to recognize spelling variants, names, acronyms, and abbreviations in ways that are useful to text searching.