Using the ICE-GB Corpus to model the English dative alternation

There are many situations in which language users can choose between two or more structural variants which are equally grammatical but may differ in their acceptability in the given context. An example is the dative alternation in English, for which speakers and writers can choose between structures with a double object (e.g. She handed the student the book.) or prepositional dative structure (e.g. She handed the book to the student.).

With the help of advanced statistical modeling, Bresnan et al. (2007) have been able to explain 94% of the English dative alternation through a combination of (para-)linguistic factors. The variety of text types in their data, however, is very narrow. The spoken data contains spontaneous conversations on fixed topics solely (Switchboard Corpus), and the written data consists only of financial newspaper articles (Wall Street Journal texts in the Penn Treebank).

For this reason, we investigate whether, and if so how, an increase in the range of text and discourse types affects the quality of the model. We employ the one-million-word syntactically annotated ICE-GB Corpus of British English (Greenbaum 1996). The corpus contains spoken dialogues and monologues, and printed and non-printed written texts, covering various topics. We will build models that are similar to those in Bresnan et al. (2007) and compare the results. Also, we aim at extending the model by including more linguistic factors.

Presented at: Aston Postgraduate Conference on Corpus Linguistics, 22 May 2008, Aston University, Birmingham, U.K.
