Publication
Big Data 2018
Conference paper

A Machine Learning Based Natural Language Question and Answering System for Healthcare Data Search using Complex Queries

View publication

Abstract

Number of use cases in healthcare are well suited as Big Data applications. In healthcare, large volumes of data are coming in and stored as unstructured big data or as structured data in relational database. In any case, Big Data is coming to embrace SQL as a common tool for querying. Developing a question and answering tool for the users that are lack of specialized skillsets and use natural languages for complex queries is a challenge that need to identify significant details, draw inferences and evaluate hypothesis as how domain experts do those. Although NLIDB systems are developed to translate a natural language queries into a database language for non-technical end users, most of the questions addressed by the systems are factoid questions and answering complex queries remains as an open research problem. The proposed auxiliary system is machine learning based and extends existing NLIDB system to help it answer the complex queries. The auxiliary system mimics the way human experts reach the answers to the complex queries. Instead of building a set of simple conditional statements as rules and invoke them as a sequence of chained actions, the proposed system decomposes complex queries into multiple simple factoid sub-queries with the goal of generating answers to each sub-query with the existing NLIDB system from the data explicitly stored in the database. The underlying NLIDB system takes sub-queries as input queries in parallel and produces query results from the data stored in the relational database. The answers to the sub-queries and the desired output labels are used to train the model and the multiclass classifier produced from the training is used to predict and answer valid input queries.

Date

Publication

Big Data 2018

Authors

Share