A regression model for the English benefactive alternation

In this paper, we use logistic regression modelling to predict the English benefactive alternation (He baked me a cake vs. He baked a cake for me). We developed a data set consisting of 107 instances in adult writing and 36 in the writing of 8-to-12-year-olds, and annotated them with 13 syntactic, semantic and discourse features. We show that a model trained and tested on the adult data reaches a prediction accuracy of 86.9%. Due to the small number of data instances, our model includes only 4 significant effects and shows considerable overfit (reaching 79.6% accuracy in a ten-fold cross-validation setting). The regression coefficients found are similar to those found in the model for the to-dative alternation (Bresnan et al. 2007). When applying the adult model to the instances in child writing, 80.6% is predicted correctly. We conclude that there are no indications of major differences either between the to-dative and benefactive alternation in adult language, nor between the benefactive alternation in adult language and that in child language.

Reference: Daphne Theijssen, Hans van Halteren, Karin Fikkers, Frederike Groothoff, Lian van Hoof, Eva van de Sande, Jorieke Tiems, Véronique Verhagen and Patrick van der Zande (2009). A regression model for the English benefactive alternation. Barbara Plank, Erik Tjong Kim Sang and Tim Van de Cruys (eds.), Computational Linguistics in the Netherlands 2009, pp. 115-130.
Paper (pdf; 146kB) ; BibTeX

