Domain cartridge: Unsupervised framework for shallow domain ontology construction from corpus
Abstract
In this work we propose an unsupervised framework to construct a shallow domain ontology from corpus. It is essential for Information Retrieval systems, Question-Answering systems, Dialogue etc. to identify important concepts in the domain and the relationship between them. We identify important domain terms of which multi-words form an important component. We show that the incorporation of multi-words improves parser performance, resulting in better parser output, which improves the performance of an existing Question-Answering system by upto 7%. On manually annotated smartphone dataset, the proposed system identifies 40.87% of the domain terms, compared to 22% recall obtained using WordNet, 43.77% by Yago and 53.74% by BabelNet respectively. However, it does not use any manually annotated resource like the compared systems. Thereafter, we propose a framework to construct a shallow ontology from the discovered domain terms by identifying four domain relations namely, Synonyms ('similar-to'), Type-Of ('isa'), Action-On ('methods') and Feature-Of ('attributes'), where we achieve significant performance improvement over WordNet, BabelNet and Yago without using any mode of supervision or manual annotation.