Evaluating deep syntactic parse accuracy and its effect in why-question answering

Where the answer type of factoid questions is predicted by the question word (e.g. who, what, where), this is not the case for why-questions. From our research we know that the answer type of why-questions can be determined by making use of the available syntactic and lexical information. In the why-question answering system currently under construction in Nijmegen, machine learning algorithms were able to reach 77.5% accuracy in predicting the answer type of why-questions under the condition that the features could be derived from various lexical resources and contextually appropriate deep syntactic parses. Only 58.2% accuracy was reached when the parses were not consulted.

The aforementioned results not only encourage the use of deep syntactic parsers in question answering, but also call for an analysis of the consequences of imperfections in the parser output. In the why-question answering project, the TOSCA parser has been applied, which is a deep syntactic parser developed for use in an interactive analysis environment in which humans are expected to inspect the output of automatic processes (incl. POS tagging and parsing) and make the appropriate selections. Therefore, fully automatically obtained parses run the risk of having deficiencies. In order to determine the effects of parse inaccuracies on determining the answer type of why-questions, we analysed the errors that occurred. With the help of existing parser evaluation methods, the different types of error could be classified. Also, the (erroneous) parses were used for determining the feature values for answer type classification. The paper reports on the evaluation procedures that we used and the results that were obtained.

Presented at: The 17th meeting of Computational Linguistics In the Netherlands (CLIN-17), 12 January 2007, Catholic University of Leuven, Leuven, Belgium.
Slides (pdf; 106kB)

back to presentations and posters