The Mushroom Effect or Why You Need Knowledge Graphs for Dialog Systems
In the past projects aimed at building infotainment goal-oriented dialog systems we observed a series of unexpectedly funny responses đ. Having enjoyed a pun, our investigation led to more serious conclusions. In this article, Iâd like to share our findings and discuss some recently delivered goodies from IJCAI 2019 which took place August 10â16th in Macao.
(and if you are a mushroom lover â look no further, I have some!)
The Mushroom Effect
A bit of a background: our goal was to build a conversational platform able to communicate with a user on some selected points of interest in Berlin, e.g., Berlin Hauptbahnhof, TV tower, public places, and more. Information sources at hand included Wikipedia articles in raw text and Wikidata knowledge graph (about 50M articles and entities, respectively). Having trained traditional pipeline components (like those described in another article) we faced some unexpected responses. For instance, talking about Berlin Central Station (Q1097 in Wikidata) our agent revealed themselves as a đaficionado:
- What is this building?
- This is Berlin Hbf
- What is its architectural style?
- Mushroom
- ÂŻ\_(ă)_/ÂŻ
The focus entity was identified correctly, coreference resolution attributed its
to Berlin Hbf, but mushrooms? đ€
Turned out it was the output of one of the famous SOTA reading comprehension models. Letâs look what we have in Wikipedia for Berlin Hbf:
Well, you might say âitâs closeâ, âitâs okayishâ, âsomewhat trueâ based on the text, or âitâs a data issueâ. Iâd say:
Reading comprehension models (especially applied in transfer learning scenarios) need a semantic sanity check.
What about knowledge graphs then? Letâs look whatâs happening in Wikidata: there is no architectural style value for Berlin Hbf, but we can sketch some hierarchical relations between different architectural styles:
đ„Bad news: âmushroomâ is not listed as a possible architectural style
đĄGood news: using inferencing powers of knowledge graphs we can deduce an explainable answer - for instance, check out favorite styles of architects who built Berlin Hbf, analyze the neighbourhood where the building is located, and more â it is actually a question of probabilistic graph analysis which might be a topic of future articles.
In general, employing a KG in your dialog system (e.g., as KG embeddings) is beneficial along several dimensions:
- Implicit and explicit constrains on entities and relations, such that answering a question like âHow many children does Berlin Hbf have?â will make no sense;
- Support for much more complex questions that might require aggregations, clarifications, comparisons, or logical entailment;
- Explainability and traceability of results;
- Dialog systems != KG QA systems, but rather a superset of KG QA as graphs are able to grasp dialog-specific details like state and history.
KG is not a silver bullet, there is still a room for improvement and tasks to be solved within conversational AI systems. Some of them were tackled at the recent ACL 2019 (lots of new ideas that we are implementing) while some were presented at the Search-Oriented Conversational AI (SCAI) workshop organized by Jeff Dalton (University of Glasgow), Julia Kiseleva (Microsoft Research & AI), Aleksandr Chulkin (Google Research), and Mikhail Burtsev (MIPT & Deep Pavlov) co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI).
SCAI Workshop @ IJCAI 2019
I didnât manage to attend the workshop physically, but found the invited talks and accepted papers very relevant, check out the website, everything is there.
First, Xiaoxue Zang introduced a Schema-Guided Dialogue State Tracking dataset (for DSTC 8 challenge) with 16 domains, 16K dialogs, 214 possible slots and 14K possible values for building task-oriented agents. While previous DST datasets considered just a handful of possible slots and values where you could memorize all of them and have distributions in memory, SGD scales the task up to the numbers that make straight memorizing computationally inefficient.
Second, Minlie Huang reported on challenges towards more intelligent conversational agents. Among the biggest and newest were incorporating knowledge- and data-driven grounding models in dialog systems, empathetic and personality computing. A part of the talk was dedicated to teaching an agent to ask meaningful questions to support a conversation, not only just answer user utterances. As to KGs, Dr. Huang described knowledge-aware encoder and decoder models able to employ KG reasoning in dialog settings.
Finally, two papers appeared very interesting to me as they were tackling a response verbalization problem. Verbalization happens when your agent has to produce its next utterance to a user based on dialog history, entities in a KG, user preferences and many more other criteria. Let me illustrate it with an example: given a context paragraph and question âWhat is the busiest train station in Germany?â an agent has to produce a response, say, already knowing that the answer includes Hamburg Hbf.
There are two methods for producing responses:
- Extraction-based: when an agent copies existing tokens that might be words in a context, previous utterances, or entities in a KG. For instance, âHamburg Hbfâ and âThe train station in Hamburgâ are extracted from the context paragraph.
Problem: extraction-based methods often produce incoherent responses. - Generation-based: when an agent actually generates a response sampling tokens from some distribution according to its trained âsense of beautyâ so the answer (theoretically) can contain new words. For instance, âThe busiest station in Germany is Hamburg Hbfâ or âThe one in Hamburg to the best of my knowledgeâ. Problem: ah only if we saw such rich and informative sentences đ â generation-based systems often tend to produce dull and boring short replies like âYesâ, âI seeâ and so on, and it takes great efforts to bring diversity and âspicinessâ into language models.
Pei at al presented an approach for dialog response selection, i.e., extraction-based, that considers previous user utterances and system replies as well as KG facts while selecting the best response. The authors build upon EntNet and memory nets: each of three sources is stored in its own memory, the final states are weighted sums of cells, and the attention mechanism to employ the most appropriate source based on memory. SEntNet shows improved scores for bAbI and mDSTC2 datasets.
Zhang et al presented Context-aware Knowledge Pre-selection (CaKe) approach for response generation conditioned by background knowledge. Here the authors consider background knowledge as free text utterances, not KGs. CaKe encodes background knowledge and dialog context to feed them further to a decoder that will generate the text or select a named entity token. The approach looks somewhat similar to the previous paper, but applies to the generative paradigm.
Conclusions
- Knowledge Graphs can be a powerful component of Conv AI systems when applied properly.
- IJCAI 2019 was even bigger than ACL 2019, i.e., 850 accepted papers (compared to 660 from ACL) out of 6032 submissions đš . Surely, there was less NLP, but more core ML like new KG embedding methods, entity alignment and graph neural nets. I did select a couple of interesting papers, but, hopefully, thereâll be reports from the attendees.
- No mushrooms were harmed in the making of this article đ