Program (Online Workshop)

Online Workshop

This page describes the setup of our 2020 NLIWoD workshop which is going to happen as an online event.

Requirements for authors

The authors need to prepare a video recording of their talk according to their submission type:

Full articles submissions: max. 15 minutes + 5 minutes discussions
Short articles submissions: max. 10minutes + 5 minutes discussions
Notes of all Q&A sessions will be made public after the workshop on the website. The Zoom stream will not be recorded.

Time Table

Half-day Workshop at the 2nd 2020 13.30 - 17.00 CET (Berlin time zone)

Program and Notes

Websites:

NLIWOD - Keynote: Bhaskar Mitra - Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
- If you use proprietary datasets, no one can reproduce them
- Neural Models are now in 79% of SIGIR papers
- We still lack public IR benchmarks with large scale training data
- Even industrial Teams now use BERT on a day-to-day basis
- Challenges:
  - Single-Deadline/Single-Submission Challenges (such as TREC)
  - Leaderborad benchmarking lead to overfitting
  - Approaches have to work on more than one dataset
  - Bender-Rule: English is not the only Language
  - Cross-flow between communities needed!
- Questions:
  - Are we hitting a glass ceiling with current ML models? A: More general purpose models
  - What advice would you do to researchers working on other languages, where the challenge for benchmark is even harder? A: Start building Benchmarks and gather a community around it
NLIWOD -Chatbot For Interacting with SDMX Databases - Guillaume Thiry, Ioana Manolescu and Leo Liberti
- Ranking of queries/datasets can be supported by metadata
- Usage of DataCubes still relevant
- Real world use on the horizon - OECD
- Questions:
  - What is important to your approach to generalize?
NLIWOD -Verbalizing the Evolution of Knowledge Graphs with Formal Concept Analysis - Martin Arispe, Mayesha Tasnim, Damien Graux, Fabrizio Orlandi and Diego Collarana
- Formal Concept Analysis to find hierachies in real-world KGs
- Questions:
  - Which verbalisation functions did you use? A: We are currently in the phase of trying out different ones.
  - What is the performance? A: FCA can deal with big data already now.

PROFILES - Keynote: Prof. Dr. Felix Naumann - Data Profiling in the Relational World
- Commercial tools are still not there yet
- How to efficiently find good dependencies? Algorithms!
- Questions:
  - Have you considered how users can be involved to quickly reduce the search space? A: Show the results as they created and show them to users as early as possible.
  - Databases usually follow the closed-world assumption. What to consider for your proposed algorithms if that is not given?
  - What happens in the presence of NULLs? A: Algorithms can deal with the answer but there are challenges!
PROFILES - An Architecture for Cell-Centric Indexing of Datasets - Lixuan Qiu, Haiyan Jia, Brian Davison and Jeff Heflin
- Table indexes are typically created on the table-level or column-level
- Usage of cell-centric index that involves metadata, cell values and other values (context) in the respective row
- Question:
  - How flexible is your cell indexing approach towards enriching the set of indexed fields (title, context,…), in particular w.r.t. dataset profiles? A: ElasticSearch easily allows addition of further search fields.
PROFILES - A Template-Based Approach for Annotating Long-Tail Datasets - Daniel Garijo, Ke-Thia Yao, Amandeep Singh and Pedro Szekely
- Table annotation typically requires expertise in semantic technologies
- Users add meta data to the table to support the transformation of the table into a KG
- Question:
  - Which Wikifier do you use? How do you understand columns? A: External Service based on Wikidata, but that is not the bottleneck. For example, property linking.
NLIWOD - Generating Knowledge Graphs from Unstructured Texts: Experiences in the eCommerce Field for Question Answering - Diogo Sant’Anna, Rodrigo Caus, Lucas Ramos, Victor Hochgreb and Julio Cesar Dos Reis
- Question Answering in GoBots can increase sales by 120%
- Entity and Intent based QA systems
- Question:
  - What do you use for training? Propriatary data and the Rasa framework.
  - Your precision is really high, how about the recall? We did measure it, please see paper.
NLIWOD - Generating Grammars from lemon lexica for Questions Answering over Linked Data: a Preliminary Analysis - Philipp Cimiano, Basil Ell, Viktoria Benz and Mohammad Fazleh Elahi
- Question Grammar generation from a lemon lexicon based on LTAG grammars and LexInfo ideas
- Advantage: portability between domains without training data and auto-completion
- Question:
  - Have you thought of combining word embedding model with lemon lexicon? We will look into it since it can add synonyms also in high-dimensional space?

Community Discussion

Feedback:

More people can participate due to lower entrance barrier
Split it in two events to ensure people from all time zones can particpate more easily
How can we open the mic better to have more people ask more question?
Live conference is preferred over pre-recorded videos in the workshops

Online Workshop NLIWOD instructions setup