The goal of the current research project is to develop a question answering system for answering *why*-questions (*why*-QA). Our system is a pipeline consisting of an of-the-shelf retrieval module followed by an answer reranking module. In this paper, we aim at improving the ranking performance of our system by finding the optimal approach to learning to rank. More specifically, we try to find the optimal ranking function to be applied to the set of candidate answers in the re-ranking module. We experiment with a number of machine learning algorithms (i.e. genetic algorithms, logistic regression and SVM), with different optimization functions.

We find that a learning-to-rank approach that optimizes MRR using either logistic regression or a genetic algorithm leads to a significant improvement over the TFIDF baseline. We reach an MRR of 0.309 with a success@10 score of 56.14%. We also see that, as opposed to logistic regression and genetic algorithms, SVM is not suitable for the current data representation. After extensive experiments with SVMs, we still reach scores that are below baseline.

In future work, we will investigate in more detail the limitations of our re-ranking approach: which set of questions cannot be answered in the current system set-up and why?

**Reference:** Suzan Verberne, Stephan Raaijmakers, Daphne Theijssen and Lou Boves (2009). Learning to Rank Answers to Why-Questions. *Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop* (DIR'09), pp. 34-41.

Paper (pdf; 180kB) ; BibTeX