The Mushroom Effect or Why You Need Knowledge Graphs for Dialog Systems

Michael Galkin
6 min readAug 18, 2019

--

In the past projects aimed at building infotainment goal-oriented dialog systems we observed a series of unexpectedly funny responses 🍄. Having enjoyed a pun, our investigation led to more serious conclusions. In this article, I’d like to share our findings and discuss some recently delivered goodies from IJCAI 2019 which took place August 10–16th in Macao.
(and if you are a mushroom lover — look no further, I have some!)

A typical case of the Mushroom Effect

The Mushroom Effect

A bit of a background: our goal was to build a conversational platform able to communicate with a user on some selected points of interest in Berlin, e.g., Berlin Hauptbahnhof, TV tower, public places, and more. Information sources at hand included Wikipedia articles in raw text and Wikidata knowledge graph (about 50M articles and entities, respectively). Having trained traditional pipeline components (like those described in another article) we faced some unexpected responses. For instance, talking about Berlin Central Station (Q1097 in Wikidata) our agent revealed themselves as a 🍄aficionado:

- What is this building?
- This is Berlin Hbf
- What is its architectural style?
- Mushroom
- ¯\_(ツ)_/¯

The focus entity was identified correctly, coreference resolution attributed its to Berlin Hbf, but mushrooms? đŸ€”

Turned out it was the output of one of the famous SOTA reading comprehension models. Let’s look what we have in Wikipedia for Berlin Hbf:

🍄 🍄 are definitely there

Well, you might say “it’s close”, “it’s okayish”, “somewhat true” based on the text, or “it’s a data issue”. I’d say:

Reading comprehension models (especially applied in transfer learning scenarios) need a semantic sanity check.

What about knowledge graphs then? Let’s look what’s happening in Wikidata: there is no architectural style value for Berlin Hbf, but we can sketch some hierarchical relations between different architectural styles:

Thanks Metaphacts for providing a nice visualisation framework

đŸ˜„Bad news: “mushroom” is not listed as a possible architectural style

💡Good news: using inferencing powers of knowledge graphs we can deduce an explainable answer - for instance, check out favorite styles of architects who built Berlin Hbf, analyze the neighbourhood where the building is located, and more — it is actually a question of probabilistic graph analysis which might be a topic of future articles.

In general, employing a KG in your dialog system (e.g., as KG embeddings) is beneficial along several dimensions:

  • Implicit and explicit constrains on entities and relations, such that answering a question like “How many children does Berlin Hbf have?” will make no sense;
  • Support for much more complex questions that might require aggregations, clarifications, comparisons, or logical entailment;
  • Explainability and traceability of results;
  • Dialog systems != KG QA systems, but rather a superset of KG QA as graphs are able to grasp dialog-specific details like state and history.

KG is not a silver bullet, there is still a room for improvement and tasks to be solved within conversational AI systems. Some of them were tackled at the recent ACL 2019 (lots of new ideas that we are implementing) while some were presented at the Search-Oriented Conversational AI (SCAI) workshop organized by Jeff Dalton (University of Glasgow), Julia Kiseleva (Microsoft Research & AI), Aleksandr Chulkin (Google Research), and Mikhail Burtsev (MIPT & Deep Pavlov) co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI).

SCAI Workshop @ IJCAI 2019

I didn’t manage to attend the workshop physically, but found the invited talks and accepted papers very relevant, check out the website, everything is there.

First, Xiaoxue Zang introduced a Schema-Guided Dialogue State Tracking dataset (for DSTC 8 challenge) with 16 domains, 16K dialogs, 214 possible slots and 14K possible values for building task-oriented agents. While previous DST datasets considered just a handful of possible slots and values where you could memorize all of them and have distributions in memory, SGD scales the task up to the numbers that make straight memorizing computationally inefficient.

SGD stats, taken from the slides

Second, Minlie Huang reported on challenges towards more intelligent conversational agents. Among the biggest and newest were incorporating knowledge- and data-driven grounding models in dialog systems, empathetic and personality computing. A part of the talk was dedicated to teaching an agent to ask meaningful questions to support a conversation, not only just answer user utterances. As to KGs, Dr. Huang described knowledge-aware encoder and decoder models able to employ KG reasoning in dialog settings.

Knowledge-aware encoders-decoders for dialog systems, taken from the slides

Finally, two papers appeared very interesting to me as they were tackling a response verbalization problem. Verbalization happens when your agent has to produce its next utterance to a user based on dialog history, entities in a KG, user preferences and many more other criteria. Let me illustrate it with an example: given a context paragraph and question “What is the busiest train station in Germany?” an agent has to produce a response, say, already knowing that the answer includes Hamburg Hbf.

Verbalization options

There are two methods for producing responses:

  • Extraction-based: when an agent copies existing tokens that might be words in a context, previous utterances, or entities in a KG. For instance, “Hamburg Hbf” and “The train station in Hamburg” are extracted from the context paragraph.
    Problem: extraction-based methods often produce incoherent responses.
  • Generation-based: when an agent actually generates a response sampling tokens from some distribution according to its trained “sense of beauty” so the answer (theoretically) can contain new words. For instance, “The busiest station in Germany is Hamburg Hbf” or “The one in Hamburg to the best of my knowledge”. Problem: ah only if we saw such rich and informative sentences 😅 — generation-based systems often tend to produce dull and boring short replies like “Yes”, “I see” and so on, and it takes great efforts to bring diversity and “spiciness” into language models.

Pei at al presented an approach for dialog response selection, i.e., extraction-based, that considers previous user utterances and system replies as well as KG facts while selecting the best response. The authors build upon EntNet and memory nets: each of three sources is stored in its own memory, the final states are weighted sums of cells, and the attention mechanism to employ the most appropriate source based on memory. SEntNet shows improved scores for bAbI and mDSTC2 datasets.

Zhang et al presented Context-aware Knowledge Pre-selection (CaKe) approach for response generation conditioned by background knowledge. Here the authors consider background knowledge as free text utterances, not KGs. CaKe encodes background knowledge and dialog context to feed them further to a decoder that will generate the text or select a named entity token. The approach looks somewhat similar to the previous paper, but applies to the generative paradigm.

Conclusions

  • Knowledge Graphs can be a powerful component of Conv AI systems when applied properly.
  • IJCAI 2019 was even bigger than ACL 2019, i.e., 850 accepted papers (compared to 660 from ACL) out of 6032 submissions 😹 . Surely, there was less NLP, but more core ML like new KG embedding methods, entity alignment and graph neural nets. I did select a couple of interesting papers, but, hopefully, there’ll be reports from the attendees.
  • No mushrooms were harmed in the making of this article 😉

--

--

Michael Galkin
Michael Galkin

Written by Michael Galkin

AI Research Scientist @ Intel Labs. Working on Graph ML, Geometric DL, and Knowledge Graphs

No responses yet