Question answering over linked data (QALD-4)
Abstract
With the increasing amount of semantic data available on the web there is a strong need for systems that allow common web users to access this body of knowledge. Especially question answering systems have received wide attention, as they allow users to express arbitrarily complex information needs in an easy and intuitive fashion (for an overview see [4]). The key challenge lies in translating the users’ information needs into a form such that they can be evaluated using standard Semantic Web query processing and inferencing techniques. Over the past years, a range of approaches have been developed to address this challenge, showing significant advances towards answering natural language questions with respect to large, heterogeneous sets of structured data. However, only few systems yet address the fact that the structured data available nowadays is distributed among a large collection of interconnected datasets, and that answers to questions can often only be provided if information from several sources are combined. In addition, a lot of information is still available only in textual form, both on the web and in the form of labels and abstracts in linked data sources. Therefore approaches are needed that can not only deal with the specific character of structured data but also with finding information in several sources, processing both structured and unstructured information, and combining such gathered information into one answer. The main objective of the open challenge on question answering over linked data6 [3] (QALD) is to provide up-to-date, demanding benchmarks that establishe a standard against which question answering systems over structured data can be evaluated and compared. QALD-4 is the fourth instalment of the QALD open challenge, comprising three tasks: multilingual question answering, biomedical question answering over interlinked data, and hybrid question answering.