Challenge

In addition to participating with innovative research, participants are cordially invited to participate with tools. NLIWOD incorporates the 8th Question Answering over Linked Data (QALD) where users can display the capabilities of their systems using the provided online benchmarking platform GERBIL QA support by the H2020 project HOBBIT.

Important Dates

- Integration testing data will be added to the platform by July 31st, 2017

- Training data available September 2nd, 2017

- System submission due: October 1st, 2017

- System integration with GERBIL QA done: October 10th, 2017

- Publication of test data: October 9th, 2017

- Results on test data: October 15th, 2017

Registration

The registration is closed.

System integration into the platform

To integrate your system with our benchmarking platform please follow our GERBIL QA instructions. In case of questions, please contact Ricardo Usbeck <ricardo.usbeck AT uni-paderborn.de>. We will use the Macro F-measure as comparison criteria on the test dataset. Note, we will test you system in all available languages.

FAQ

1. Do I need to submit a paper?

=> Challenge participants do not have to submit a paper but are free to choose to submit a workshop paper to NLIWOD to present their QALD Challenge entry. (Must be submitted within the workshop deadlines)

2. Do I need to be registered for ISWC?

=> No, but we cordially invite you to it.

3. Where can I find more information about the last series?

=> Information about the last QALD challenge can be found at the HOBBIT website.

Datasets

Task 1 - Multilingual QA over DBpedia

Train dataset: qald-7-train-multilingual.json

Test dataset: TBA

Given the diversity of languages used on the web, there is an impeding need to facilitate multilingual access to semantic data. The core task of QALD is thus to retrieve answers from an RDF data repository given an information need expressed in a variety of natural languages.

The underlying RDF dataset will be DBpedia 2016-10. The training data will consist of more than 250 questions compiled and curated from previous challenges. The questions will be available in 3 to 8 different languages (English, Spanish, German, Italian, French, Dutch, Romanian, Hindi and Farsi), possibly with the addition of three further languages (Korea and Brazilian Portuguese). Those questions are general, open-domain factual questions, for example:

(en) Which book has the most pages?

(de) Welches Buch hat die meisten Seiten?

(es) Que libro tiene el mayor numero de paginas?

(it) Quale libro ha il maggior numero di pagine?

(fr) Quel livre a le plus de pages?

(nl) Welk boek heeft de meeste pagina’s?

(ro) Ce carte are cele mai multe pagini?

The questions vary with respect to their complexity, including questions with counts (e.g., How many children does Eddie Murphy have?…), superlatives (e.g., Which museum in New York has the most visitors? ), comparatives (e.g., Is Lake Baikal bigger than the Great Bear Lake? ), and temporal aggregators (e.g., How many companies were founded in the same year as Google? ). Each question is annotated with a manually specified SPARQL query and answers.

Data creation: The test dataset will consist of 50 to 100 manually compiled similar questions. We plan to compile those from existing, real-world question and query logs, in order to provide unbiased questions expressing real-world information needs which will then be manually curated to ensure a high quality standard. Existing methodology for selecting queries from query logs has been shown to indeed be able to retrieve prototypical queries. We have seen more than 30 submitted systems over the course of the last QALD challenges attracting systems for most languages.

Task 2 - Hybrid question Answering - WILL NOT HAPPEN (too few participants)

Train dataset: qald-7-train-hybrid.json

Test dataset: TBA

A lot of information is still available only in textual form, both on the web and in the form of labels and abstracts in Linked Data sources. Therefore, approaches are needed that can not only deal with the specific character of structured data but also with finding information in several sources, processing both structured and unstructured information, and combining such gathered information into one answer.

QALD therefore includes a task on hybrid question answering, asking systems to retrieve answers for questions that required the integration of data both from RDF and from textual sources. In the previous instantiation of the challenge, this task has gained significant momentum: it attracted seven participating systems.

The task will build on DBpedia 2016-10 as RDF knowledge base, together with the English Wikipedia as textual data source. As training data, we will compile more than 100 English questions from past challenges. The questions are annotated with answers as well as a pseudo query that indicates which information can be obtained from RDF data and which from free text. The pseudo query is like an RDF query but can contain free text as subject, property, or object of a triple.

Data creation: As test questions, we will provide 50 similar questions all manually created and checked by at least 2 data experts. The main goal when devising those questions will not be to take into account the vast amount of data avail-able and problems arising from noisy, duplicate and conflicting information, but rather to enable a controlled and fair evaluation, given that hybrid question answering is a still very young line of research.

Task 3: English question answering over Wikidata - WILL NOT HAPPEN (too few participants)

Train dataset: qald-7-train-en-wikidata.json

Test dataset: TBA

Another new task introduced this year will use a public data source Wikidata (https://www.wikidata.org/) as a target repository. The training data will include 100 open-domain factual questions compiled from the previous iteration of Task 1. In this task, the questions originally formulated for DBpedia should be answered using Wikidata. Thus, your systems will have to deal with a different data representation structure. This task will help to evaluate how generic your approach is and how easy it is to adapt to a new data source. Note that the results obtained from Wikidata might be different to the answers to the same queries found in DBpedia.

Data creation: This task was designed in the context of the DIESEL project (https://diesel-project.eu/). The training set contains 100 questions taken from the Task 1 of the QALD-6 challenge. We formulated the queries to answer these questions from Wikidata and generated the gold standard answers using them. For this task, we use the Wikidata dump from 09-01-2017 (https://dumps.wikimedia.org/wikidatawiki/entities/20170109/). The Wikidata dataset used to create this benchmark can be found on HOBBIT’s ftp server and the Docker image for running this data with Blazegraph can be found in metaphacts’ Docker Hub.