Automatically classifying why-questions with the help of syntax

This paper describes how syntactic information is used in a system that attempts to answer why-questions automatically. In this Question Answering system, machine learning algorithms classify why-questions according to their answer type (CAUSE or MOTIVATION), on the basis of deeper linguistic information such as syntactic structure. The aim of this paper is to establish how errors in the syntactic representations of the questions influence the performance of the why-question classifier. I employed the parsing system TOSCA (Oostdijk 1996) to obtain syntactic structures without human interference. From the classification accuracy we can conclude that the use of the parser module in TOSCA for distinguishing CAUSE and MOTIVATION is promising (86.8%) when provided with manually checked part-of-speech (word class) tags. Applying the whole TOSCA system - including POS tagging - fully automatically, however, seems troublesome due to the modularity of the system.

Reference: Daphne Theijssen (2008). Automatically classifying why-questions with the help of syntax. Elizabeth Koier, Olivia Loonen and Marieke Meelen (eds.), Leiden Working Papers in Linguistics 5.1 (online), pp. 36-52.
Paper (pdf; 152kB) ; BibTeX

back to publications