Here is the second part of the review of knowledge graph related papers from EMNLP 2019. In this part, we’ll talk about Question Answering over Knowledge Graphs, NLG from KGs, Commonsense reasoning with KGs, and some old school Named Entity/Relation Recognition & Linking. Let’s start 🚀
- Augmented Language Models
- Dialogue Systems and Conversational AI
- Building Knowledge Graphs from Text (Open KGs)
- Knowledge Graph Embeddings
Part II (👈 you are here)
Question Answering over Knowledge Graphs
Question Answering (QA) enjoys a growing traction from the NLP communinty. Machine Reading Comprehension (MRC) QA where you need to process textual references and documents recently received a bunch of large-scale and complex tasks like HotpotQA, Natural Questions, ELI5. 🤯 On the other hand, Question Answering over Knowledge Graphs (KGQA), where you need to answer questions based on a background knowledge graph, is developing somewhat slower. A recent addition to the KGQA datasets family is LC-QuAD 2 (from our team, sorry for a shameless self-promotion 😊). So, did EMNLP 2019 bring any goodies? 🍰
CSQA is the most complex KGQA datasets: it includes questions with comparative reasoning, quantitative reasoning, clarification questions and more , all wrapped in a conversational scenario when you need to resolve coreference and resolve pointers a-la “what about the second one you mentioned” (check out the full dataset description on the website). Sounds as just enough complex? Fully neural approaches like Key-Value Memory Nets show very low F1 scores (about 6%), so you’d need something more flexible. 👉 Shen et al 👈 propose MaSP, a model based on semantic parsing and logical forms that implements a four-steps strategy.
First, a current question and available dialogue history are word-piece tokenized and embedded. Second, they are passed through a Transformer encoder. Third, Entity Linker finds appropriate entities and types mentioned in a question. Finally, the decoder selects a new action from a pre-defined grammar and builds a logical form (program) that is eventually executed against a knowledge graph. MaSP yields a stunning 79.3% F1 score (compare to 6% baseline) 👀 . A general observation on SOTA 📈 models on CSQA: all of them employ semantic parsing, some high-level vocabulary of actions, and logical programs to deal with complexity of the tasks, whereas E2E neural approaches are far behind 📉.
On simpler LC-QuAD 1 and QALD tasks, Ding et al also employ a formal grammar with entity/relation linker EARL in order to construct and rank SPARQL query templates in their SubQG model. In a multi-step process, the authors identify relevant query substructures and eventually merge them with identified constants from a question (look at the Figure nearby).
In PullNet proposed by Sun et al questions are probed against two sources: a raw text corpus and a knowledge graph. PullNet builds a query graph and iteratively expands it “pulling” new entities, textual references or facts from the KG. Turns out that combining both sources proves efficient as performance on MetaQA (QA dataset with text, triples, and audio samples) , WebQuestionsSP and ComplexWebQuestions grows especially when you need to perform 2- or 3-hops reasoning.
🤔 [Discussion] 🤔 It looks like a general trend in MRC QA and KGQA that when you need to deal with complex multi-hop questions you’d need some formal support or representation of the sources (documents) at hand. KGs seem like an effective way to provide such support and allow much more accurate multi-hop QA. In some tasks those supporting KGs are given, in other tasks you might want to build them with IE tools or align with existing KGs like Wikidata.
Processing a big set of questions, you might want to use data augmentation often used in CV, that is, you need questions asked differently though still using the same set of entities and relations. Liu et al study how to generate questions given
(s, p, o) triples adding a special answer-aware loss to tune the generated questions that they actually ask for the object of a triple.
In a similar fashion of generating more precise questions, Xu et al define a model for asking clarification questions: given something like “Where is Berlin?” it is natural that there might be several cities named Berlin in your KG, so you’ll need to clarify (give a hand in ranking named entities, if you want) “Do you mean Berlin in Germany or the one in New Hampshire, US?”. The authors also publish a new CLAQUA dataset for training such systems 👍
A small glimpse on MRC Question Answering: approaches that integrate ConceptNet and use it as additional knowledge reference perform well on OpenBookQA (GapQA by Khot et al) and MS MARCO (KEAG by Bi et al) 👏
Natural Language Generation from KGs
QA and dialogue systems often resort to KGs to retrieve a certain fact or a small subgraph of triples. Giving a raw output in a triple form won’t increase usability of the system. The task of Natural Language Generation from KGs envisions generating coherent natural language sentences given input triples. On a more abstract level, data-to-text implies generating text from any structured resource like RDF graph, web tables, relational databases etc.
Ferreira et al challenge end-to-end and pipeline-based neural NLG models in order to find who yields the most coherent and fluent text 💪 (reference dataset is WebNLG 1.4). By a pipeline-based approach the authors understand discourse ordering, text structuring, lexicalization, referring expression generation, and, finally, text generation. Turns out that pipeline-based (though neural) models produce more natural text. Moreover, they can adapt to unseen cases where E2E models yield junky stuff. However, a rule-based winner of WebNLG performs on par with the best neural configurations.
Commonsense Reasoning with KGs
Commonsense has always been hard to incept into machine intelligence. How to explain and make the machine remember that crossing the road on red is bad? Or having a work-life balance is advisable (not in academia, haha)? Looks like commonsense pertains to humans, but can we teach NLU systems to grasp basic concepts? Luckily, we do have a couple of new interesting tasks!
Lin et al study CommonsenseQA where you need to answer commonsense questions using a background KG ConceptNet. Their proposed model KagNet lifts graph convolutional networks (GCN) together with hierarchical path-based attention to perform graph-based reasoning over given questions and ConceptNet. The model yields 57% accuracy on the test set. The basic idea and direction proved useful as just three months later after KagNet submission on arxiv another model was released that used XL-Net as a language encoder, GCNs as well, plus a new Graph Attention Net (GAT) component. The new model reaches 75% accuracy: so here is a great argument that graph-based reasoning generally works 🤩
🔥 Sinha et al 🔥 take the idea of commonsense graph-based reasoning on the inductive logic level and release CLUTRR, a benchmark suite for testing NLU and inductive reasoning capabilities. CLUTTR generates short stories in natural language and the goal is to infer the implicit relationship between two entities. The stories are generated semi-synthetically, that is, you can vary complexity of the task and other parameters. Inspired by Inductive Logic Programming (ILP), CLUTTR builds a symbolic space, the background kinship knowledge graph, that consists of logical rules and sampled instances. The textual stories are then generated based on the KG adding noise and irrelevant facts to check robustness of the tested systems. The authors also came up with several baselines and show that Graph Attention Nets (GAT) used for graph reasoning are much more robust compared to Seq2seq and Transformer operating only on the story text level. The new resource is a blast, be sure to check it out! 😍
Meanwhile, a prolific group on commonsense reasoning from AllenAI released two more datasets 🎆 : CosmosQA by Huang et al for MRC QA, and SocialIQa by Sap et al where commonsense reasoning has to be applied to resolve social interaction cases 😊
NER & RL & Linking
Joint named entity & relation recognition and linking is a general trend observable at EMNLP 2019. Indeed, considering them together helps to resolve context-dependent situations. Plus, we know that we can extract graphs made of entities and relations, so why would you want to treat them separately?
Wu et al employ joint entity and relation alignment for the entity linking task when you have same entities in a knowledge graph expressed in different languages (say, FR and EN). The authors employ Highway-GCNs to obtain entity embeddings, then calculate relations representations from head and tail embeddings, and eventually train a model using a joint entity / relation alignment loss function. On DBP15K (multi-lingual subset of DBpedia) the model achieves very good results 👏
🏥 Similarly, but in the biomedical domain, Du et al leverage a joint framework to extract entities and relations from clinical conversations. They also develop a custom KG tailored for describing symptoms and their properties. Overall, the Relational Span-Attribute Tagging Model (R-SAT) drastically outperforms baselines though still somewhat far from humans.
That’s it for EMNLP 2019! Hope you didn’t get overwhelmed by all the math.
In the next post we’ll examine graph-related publications from the upcoming NeurIPS 2019 which is usually more hardcore-ML oriented. Stay tuned 😉