A rough overview of State-of-the-Art Question Answering Systems

November 11, 2016

Question answering systems can provide a large value to users that can not use SQL. We present a brief overview of background knowledge and give an overview to the current state-of-the-art in question answering

The amount of data stored in knowledge bases and databases grows every moment. This creates a need for intuitive and efficient query-interfaces that give users that have no computer science background a possibility to extract information from that data.

As of today there are already several options to query databases without using formal query languages like SQL or SPARQL: One popular choice is to use keyword based search-engines like Google or Lucene, but often these systems can not fullfill the information need directly but only help the user to find the information they are searching for. In business intelligence query or chart builders like Chartio are very popular. These tools present graphical interfaces where user can construct graphs intuitively. However several studies like (Kaufmann & Bernstein, 2007) have shown that users prefer natural-language-interfaces if they work properly.

In this post we will give a brief overview of basic techniques to build natural language interfaces to knowledge bases and present several state-of-the-art implementations that try to solve that problem.


Before discussing anything else, let’s formulate first our goal:

Given a Knowledge Base, give a user a system that can answer questions expressed in natural language

Technical task

The description of the objective is already useful but does not tell us anything about what to do, when we want to solve the task from a technical point of view.

Technically we want to translate from natural language to some query language:

However, the direct translation from natural language to a query language is very hard. We try to simplify the process and translate first from natural language to some semantics. Later we translate from this semantics to the target query language. This way we can target different query languages without rewriting the whole system:

Translating from the semantics (for example first-order logic or description logic to a query language like SQL or SPARQL is relatively easy. We wont discuss it here in more detail but there will be another post on this topic later.

The more interesting part is the translation from natural language to semantics. Again it is not clear how to do this directly. One way to decompose this task is to first parse the input to get a syntactic structure and based on this syntactic analysis compose the semantics:


Now we have an objective to aim for. Let’s dive into some basics that may help us to achieve that.


Parsing is the fundament of mostly everything that has something to do with natural language processing (NLP). It takes a string (for example a sentence) and converts it to a thing (for example some syntactic representation) – or statet in poetic form it may sound this way:

A Parser for Things
is a function from Strings
to Lists of Pairs
of Things and Strings!

–– Graham Hutton

Syntactic parser

One of the basic parsing technique is the syntactic parsing. A syntactic parser is a function parse :: String → ParseTree that returns the syntactic structure for the given sentence. For example the parse tree for the sentence

How heavy is Jupiter?

may look the following way (This is the output of the Stanford parser (Socher, Bauer, Manning, & Ng, 2013) modified for better readability):

The parser detects that the sentence is a Wh-question consisting of an adjectival-phrase “How heavy”, a clause “is Jupiter” and the question mark.

There are several methods to build syntactic parsers: One way is to build it using some grammar like context-free grammars or lexicalized tree-adjoining grammars encoding all rules by hand or learn it from a large corpus like the Penn Treebank using for example probabilistic context-free grammars or more advanced techniques involving recurrent neural networks (see for example (Socher, Lin, Manning, & Ng, 2011)) or autoencoder.

The results of state-of-the-art syntactic parser are already very good and useful but sometimes the level of abstraction is not enough.

Depency parser

Dependency parsers (see Dependency grammar for more explanation) can be more useful in various applications. Instead of returning the linguistic sentence structure they detect relations between words. The result of a parse (parse :: String → [(Word, DependencyKind, Word)]) is a directed acyclic graph. For example the dependency graph (here it is even a tree) for the sentence

How heavy is Jupiter?

may look the following way: (generated by the Stanford dependency parser (Chen & Manning, 2014))

The root is the verb “is” having a subject “Jupiter” and some dependency to “heavy” (but in this case it can not deduce which kind of dependency it is). Finally “how” is detected as an adverbial modifier of “heavy”.

To build such a parser model, often the output of the syntactic parser is used and then a neural network is trained that learns to find the relations between the words.

Often this representation is much more useful for further analysis since it is not the plain syntactic representation but has more useful information in it and the dependencies are clearer.

Semantic parser

An approach that returns an even more abstract representation of the input are the semantic parsers. Looking from the outside they skip the syntax-part and return the semantics directly, for example in first order logic or discourse representation theory.

A widely known implementation of a semantic parser is Boxer (Bos, 2008) (The original website is not available anymore but you can find an API and the source code for it here).

For our input sentence

How heavy is Jupiter?

Boxer yields the following first order formula (modified for better readability):

Read it for example as: There is some that is a geographical name (jupiter) and there is a that is the how-heaviness and experiences

The issue with such semantic parsers is mainly, that they analyze the input without any information about the knowledge base we want to query and thus are too “global” and need a lot of post-processing diminishing its advantages over a less abstract parser.


Having discussed briefly the parsing let’s look at the next step in our pyramid:

There a many different ways how to approach semantics for natural language processing. One way are ad-hoc approaches – they often lead to graph-like structures and questions like “What have we created here?” and “What does that even mean, semantically?”. Formal approaches prevent these problems and give the ability to think mathematically about semantics. Thus we will focus here on formal defined compositional and distributional semantics.


The main idea of compositional semantics can be expressed the following way:

The meaning of a thing is composed from the meanings of sub-things

For example, the meaning of a sentence is composed from the meanings of the part of speeches. This is exactly what the output of a parser described above helps us to do. We can traverse the parse tree or graph top-down or bottom-up and build a semantic representation.

Due to the straight-forward approach there are many different models, for example thus based on lambda calculus like LambdaDCS (Liang, 2013), or models based in some way on discourse representation theory (DRT) (Kamp & Reyle, 1993) like underspecified DRT (UDRT) (Reyle, 1993) or DUDES (Cimiano, 2009), (Cimiano, Unger, & McCrae, 2014).


Later there will be another blog post about DUDES that will cover the topic more deeply with formal definitions and an example implementation

We will present here only the brief idea of DUDES using a small example.

Let’s consider the dependency graph for the sentence

Alice likes planets

we use a dictionary that maps from words to DUDES, for example the DUDES for “likes” looks the following way:

Definition DUDES: More formally a simplified DUDES (represented as a 4-tuple ) consists of four elements:

A variable (main referent)
the main variable, the DUDES is about. Here:
A set of variables (universe)
set of all variables occurring in a DUDES. Here:
A set of conditions
is the meaning of a DUDES. Here:
A set of pairs (arguments) of the form
is the grammatical relation in the dependency graph (e.g. or ) and is the main variable (e.g. or ) of the argument DUDES. Here:

We give the other DUDES for our example:


Now we can start with the composition of the given DUDES. It starts at the root node in the dependency graph and takes the DUDES for the entry in the dictionary:

then it traverses the graph top down matching the arguments of the current DUDES with the relations in the graph. The current DUDES has two arguments: and and “likes” has two children. “Alice” connected by a and “planets” connected by a -relation. We start with the and compose now the DUDES for “Alice” with our current DUDES.

The composition works as follows: Given the current DUDES and an argument DUDES such that there is an argument and both nodes are connected by in the graph the resulting DUDES is (the variable in and must be replaced by )

For our example the resulting DUDES is

Now the only edge left is the one from “likes” to “planets” labeled by . After composing the DUDES accordingly we get the result:

Since no arguments are left and we traversed the whole dependency graph we are done and got a semantic representation for the sentence “Alice likes planets”


A completely different approach to semantics are the distributional semantics. They are motivated by the distributional hypothesis (Harris, 1954):

Things occurring in a similar context mean a similar thing

E.g. if two words occur often in the same context they mean probably the same thing. To achieve this goal, words are mapped to vectors in a way such that similar vectors (i.e. they are close to each other) mean similar things. These models are popular in the NLP world for their usefulness in Neural-Network models that work better with numbers (and uncertainties) than with logical facts.

One of the most popular models in this field is the Word2Vec model (Mikolov, Chen, Corrado, & Dean, 2013). It is an unsupervised training method that takes a very large corpus of textual data and trains word vectors that achieve good results.

One issue with distributional semantics is that although there are approaches like Doc2Vec (Le & Mikolov, 2014) it is not very clear how to compose the representation of phrases out of the word vectors and how to translate from these vector representations to logic-founded query languages like SPARQL or SQL. While this is a very interesting field of research it is out of scope of this work and we will not cover it more deeply here.

Question Answering Systems

After presenting basic techniques we give now a brief overview of existing systems and group them based on their underlying motivation and architecture.

Triple based


Triple based systems have in common that they have a relatively simple architecture and do not try to process the input word for word to get a semantic meaning, but use different feature extraction models and heuristics to find ontological concepts (everything that is in an ontology like classes, entities, properties, relations) and use this to build a query:

This simplification is often an advantage since such systems are relatively fast to build and can produce good results. It is probably the category with most systems. For example there are

  • NLP Reduce (Kaufmann, Bernstein, & Fischer, 2007) is a system that avoids NLP because it is to hard to cope with and uses a “naive” approach to extract information from the input. In particular the model tries to match words from the input to the ontology by exact string matching, synonym detection and stemming. Then it uses these to build a query from the sets of ontological concepts.
  • AquaLog (Lopez, Uren, Motta, & Pasin, 2007) is based on similar ideas as NLP Reduce but uses more preprocessing steps and introduces a relation similarity service that is able to find relations across several knowledge bases. Additionally it implements a learning component to improve the quality of the system over time.
  • FREyA (Damljanovic, Agatonovic, & Cunningham, 2010) we discuss it in the next section


The basic idea of FREyA is to use the knowledge in the ontology to understand questions (opposed to use linguistic features like described in our pyramid in the beginning) and then to use a syntactic parser to improve the answer. The most interesting idea of FREyA is that it asks the user whenever the system fails and learns and remembers the choices to avoid similar mistakes in the future.

The architecture can be depicted as follows: (simplified)

Given a question FREyA first tries to detect ontology concepts by string matching – e.g. if there is “Jupiter” in the question and “jupiter” in the ontology this is matched. After exhaustive search of all words in the ontology the potential concept detection is applied returning the noun phrases and adjectives of the question that were not matched to the ontology or where ambiguities were found (for example Mississippi can mean the state or the river). In the next step all potential ontology concepts are disambiguated, either automatically (for example it is clear in the question “What rivers flow through Mississippi?” that the state is meant) or by asking the user what a certain word means. The user can select from a list of hopefully relevant selections, extracted from the ontology by applying several similarity measures. After the user selects a meaning the potential ontology concepts is converted to ontology concepts.

To formulate a query from the set of ontology concepts FREyA looks at the data types of the relations in the ontology and tries to find a representation that can be translated to a query.


The simplicity of the triple based approaches yield systems that have several advantages like:

  • No setup needed
  • Works out of the box
  • Performance can improve over time

But have also downsides:

  • Do not try to “really” understand query
  • Hard to use with very large knowledge bases with many ambiguities
  • Can only answer factoid questions, negation and quantification are not supported

Machine Learning based


Machine learning based approaches try to tackle several problems of the triple-based ones, in particular the problem, that they do not scale well to large knowledge bases. ML based approaches often have an architecture similar to:

Given the query, several transformation steps (learned and hand-crafted) are used. The transformations can be independent of each other, for example DeepQA (IBM Watson) (Ferrucci et al., 2010) uses several models in parallel to try to solve the given problem. Each transformation returns a ranked set of possible results and the system chooses which of these results are worth to be processed further. In the end it picks the best result and presents it to the user. Apart from IBM Watson there are also other systems:


OQA uses curated (Freebase) and extracted knowledge bases (Open Information Extraction) to be able to answer questions not bound to a specific knowledge base. Given the knowledge bases the system must be able to handle very noisy data. The approach learns paraphrasing rules from a large set of clustered questions from Answers.com.

Given a question, OQA tries to paraphrase it to get a Wh-question (Where, When, What, etc…). This question is further processed by a manual build parser that extracts the main information. Since the relations are often not present in the knowledge base the rewrite step rewrites the given triple to a form that matches the data and then the query is executed.


As discussed before, machine learned based approaches are often build to be able to answer open domain questions and due to their architecture and training they are also relatively robust to variability in natural language. Especially, they can handle grammatically incorrect questions without hassle.

On the downside they need a large training set – and often this is not available for specific knowledge bases. Also most of the systems are not able to handle complex queries with negation and quantification.



Compositional approaches follow mostly the architectural idea discussed in the very beginning of this post (the semantic pyramid):

The systems differ in the way they approach the parsing and semantic composition. Many have there origin in the different ORAKEL systems, e.g. beginning with (Cimiano, 2004). Here we will look at Pythia (Unger & Cimiano, 2011)


Pythia uses a hand-crafted lexicon that maps from LTAG-trees to DUDES and based on this generates a grammar that parses the input question and composes semantics to formulate the final query:

The lexicon itself consists of a ontology-independent part that can be used for every ontology and an ontology-specific part that is not shared across ontologies. The creation of it can be partially automated but in general requires a lot of work.


Due to the formal foundations of these approaches they have many advantages:

  • Do not need training data
  • Can answer complex queries (negation, quantification is possible)
  • “Understands” query (structure, meaning)
  • Easily extensible for improved user experience (It is relatively easy to add auto-completion of queries, generate example queries and suggest users more specific or similar queries)

But the biggest problems are, that building the lexicon requires a lot of manual work and that the presented systems can not understand questions with words not occurring in the lexicon.


After getting an overview of the current state of the art in question-answering systems our conclusion is that no system presented here is production ready (apart from DeepQA, which is used and promoted actively by IBM) and that non of the approaches seam to work on their own. It may be interesting to combine the ideas from different approaches and see which advantages this might give.

Apart from that there are many open questions left:

  • How to get systems that can understand questions outside of their vocabulary
  • with systems that can handle questions with negation, quantification and logical operators?
  • How to help users to formulate the first question?
  • How to help users to explore many possible questions?
  • How to avoid “I don’t understand”?
  • Given a query how to help users to find out more?
  • How can a user trust the system?
  • How can a system prove that the answer is correct?

There are still many questions open in this very interesting and active field of research and there is a lot to do before question answering systems will be perfect, but for certain use cases the state-of-the-art is good enough to provide benefit to the users.


  1. Bos, J. (2008). Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference on Semantics in Text Processing (pp. 277–286). Association for Computational Linguistics.
  2. Kamp, H., & Reyle, U. (1993). From Discourse to Logic; Introduction to the Modeltheoretic Semantics of natural language.
  3. Reyle, U. (1993). Dealing with ambiguities by underspecification: Construction, representation and deduction. Journal of Semantics, 10(2), 123–179.
  4. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. ArXiv Preprint ArXiv:1301.3781.
  5. Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with Compositional Vector Grammars. In ACL (1) (pp. 455–465).
  6. Chen, D., & Manning, C. D. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Empirical Methods in Natural Language Processing (EMNLP).
  7. Kaufmann, E., Bernstein, A., & Fischer, L. (2007). Nlp-reduce: A “naıve” but domain-independent natural language interface for querying ontologies. 4th ESWC, Innsbruck, A.
  8. Lopez, V., Uren, V., Motta, E., & Pasin, M. (2007). AquaLog: An ontology-driven question answering system for organizational semantic intranets. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2), 72–105.
  9. Damljanovic, D., Agatonovic, M., & Cunningham, H. (2010). Natural language interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. In Extended Semantic Web Conference (pp. 106–120). Springer.
  10. Vargas-Vera, M., & Motta, E. (2004). AQUA–ontology-based question answering system. In Mexican International Conference on Artificial Intelligence (pp. 468–477). Springer.
  11. Fader, A., Zettlemoyer, L. S., & Etzioni, O. (2013). Paraphrase-Driven Learning for Open Question Answering. In ACL (1) (pp. 1608–1618). Citeseer.
  12. Fader, A., Zettlemoyer, L., & Etzioni, O. (2014). Open question answering over curated and extracted knowledge bases. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1156–1165). ACM.
  13. Cimiano, P. (2004). ORAKEL: a natural language interface to an F-Logic knowledge base. In International Conference on Application of Natural Language to Information Systems (pp. 401–406). Springer.
  14. Unger, C., & Cimiano, P. (2011). Pythia: Compositional meaning construction for ontology-based question answering on the semantic web. In International Conference on Application of Natural Language to Information Systems (pp. 153–160). Springer.
  15. Bordes, A., Chopra, S., & Weston, J. (2014). Question answering with subgraph embeddings. ArXiv Preprint ArXiv:1406.3676.
  16. Cimiano, P. (2009). Flexible semantic composition with DUDES. In Proceedings of the Eighth International Conference on Computational Semantics (pp. 272–276). Association for Computational Linguistics.
  17. Le, Q. V., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. In ICML (Vol. 14, pp. 1188–1196).
  18. Schabes, Y. (1990). Mathematical and computational aspects of lexicalized grammars.
  19. Liang, P. (2013). Lambda Dependency-Based Compositional Semantics. CoRR, abs/1309.4408. Retrieved from http://arxiv.org/abs/1309.4408
  20. Cimiano, P., Unger, C., & McCrae, J. (2014). Ontology-based interpretation of natural language . Synthesis lectures on human language technologies 1947-4059. San Rafael, California <1537 Fourth Street, San Rafael, CA 94901 USA> : : Morgan & Claypool,
  21. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … others. (2010). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3), 59–79.
  22. Socher, R., Lin, C. C., Manning, C., & Ng, A. Y. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 129–136).
  23. Harris, Z. (1954). Distributional Hypothesis. Word.
  24. Kaufmann, E., & Bernstein, A. (2007). How useful are natural language interfaces to the semantic web for casual end-users? In The Semantic Web (pp. 281–294). Springer.


comments powered by Disqus