In this article, we automatically create two large and richly annotated data sets for studying the English dative alternation. With an intrinsic and an extrinsic evaluation, we address the question of whether such data sets that are obtained and enriched automatically are suitable for linguistic research, even if they contain errors. The extrinsic evaluation consists of building logistic regression models with these data sets. We conclude that the automatic approach for detecting instances of the dative alternation still needs human intervention, but that it is indeed possible to annotate the instances with features that are syntactic, semantic and discourse-related in nature. Only the automatic classification of the concreteness of nouns is problematic.
Reference: Daphne Theijssen, Lou Boves, Hans van Halteren en Nelleke Oostdijk (2011). Evaluating automatic annotation: Automatically detecting and enriching instances of the dative alternation. Language Resources and Evaluation. Springer.
Paper (pdf; 428kB) ; BibTex
back to publications