Program (Online Workshop)
Online Workshop
This page describes the setup of our 2020 NLIWoD workshop which is going to happen as an online event.
Requirements for authors
The authors need to prepare a video recording of their talk according to their submission type:
Full articles submissions: max. 15 minutes + 5 minutes discussions
Short articles submissions: max. 10minutes + 5 minutes discussions
Notes of all Q&A sessions will be made public after the workshop on the website. The Zoom stream will not be recorded.
Time Table
Half-day Workshop at the 2nd 2020 13.30 - 17.00 CET (Berlin time zone)
Program and Notes
Websites:
NLIWOD - Keynote: Bhaskar Mitra - Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
If you use proprietary datasets, no one can reproduce them
Neural Models are now in 79% of SIGIR papers
We still lack public IR benchmarks with large scale training data
Even industrial Teams now use BERT on a day-to-day basis
Challenges:
Single-Deadline/Single-Submission Challenges (such as TREC)
Leaderborad benchmarking lead to overfitting
Approaches have to work on more than one dataset
Bender-Rule: English is not the only Language
Cross-flow between communities needed!
Questions:
Are we hitting a glass ceiling with current ML models? A: More general purpose models
What advice would you do to researchers working on other languages, where the challenge for benchmark is even harder? A: Start building Benchmarks and gather a community around it
NLIWOD -Chatbot For Interacting with SDMX Databases - Guillaume Thiry, Ioana Manolescu and Leo Liberti
Ranking of queries/datasets can be supported by metadata
Usage of DataCubes still relevant
Real world use on the horizon - OECD
Questions:
What is important to your approach to generalize?
NLIWOD -Verbalizing the Evolution of Knowledge Graphs with Formal Concept Analysis - Martin Arispe, Mayesha Tasnim, Damien Graux, Fabrizio Orlandi and Diego Collarana
Formal Concept Analysis to find hierachies in real-world KGs
Questions:
Which verbalisation functions did you use? A: We are currently in the phase of trying out different ones.
What is the performance? A: FCA can deal with big data already now.
PROFILES - Keynote: Prof. Dr. Felix Naumann - Data Profiling in the Relational World
Commercial tools are still not there yet
How to efficiently find good dependencies? Algorithms!
Questions:
Have you considered how users can be involved to quickly reduce the search space? A: Show the results as they created and show them to users as early as possible.
Databases usually follow the closed-world assumption. What to consider for your proposed algorithms if that is not given?
What happens in the presence of NULLs? A: Algorithms can deal with the answer but there are challenges!
PROFILES - An Architecture for Cell-Centric Indexing of Datasets - Lixuan Qiu, Haiyan Jia, Brian Davison and Jeff Heflin
Table indexes are typically created on the table-level or column-level
Usage of cell-centric index that involves metadata, cell values and other values (context) in the respective row
Question:
How flexible is your cell indexing approach towards enriching the set of indexed fields (title, context,…), in particular w.r.t. dataset profiles? A: ElasticSearch easily allows addition of further search fields.
PROFILES - A Template-Based Approach for Annotating Long-Tail Datasets - Daniel Garijo, Ke-Thia Yao, Amandeep Singh and Pedro Szekely
Table annotation typically requires expertise in semantic technologies
Users add meta data to the table to support the transformation of the table into a KG
Question:
Which Wikifier do you use? How do you understand columns? A: External Service based on Wikidata, but that is not the bottleneck. For example, property linking.
NLIWOD - Generating Knowledge Graphs from Unstructured Texts: Experiences in the eCommerce Field for Question Answering - Diogo Sant’Anna, Rodrigo Caus, Lucas Ramos, Victor Hochgreb and Julio Cesar Dos Reis
Question Answering in GoBots can increase sales by 120%
Entity and Intent based QA systems
Question:
What do you use for training? Propriatary data and the Rasa framework.
Your precision is really high, how about the recall? We did measure it, please see paper.
NLIWOD - Generating Grammars from lemon lexica for Questions Answering over Linked Data: a Preliminary Analysis - Philipp Cimiano, Basil Ell, Viktoria Benz and Mohammad Fazleh Elahi
Question Grammar generation from a lemon lexicon based on LTAG grammars and LexInfo ideas
Advantage: portability between domains without training data and auto-completion
Question:
Have you thought of combining word embedding model with lemon lexicon? We will look into it since it can add synonyms also in high-dimensional space?
Community Discussion
Feedback:
More people can participate due to lower entrance barrier
Split it in two events to ensure people from all time zones can particpate more easily
How can we open the mic better to have more people ask more question?
Live conference is preferred over pre-recorded videos in the workshops