Modelling the English dative alternation
Feel free to download the pdf version of my PhD thesis (1130kB).
Traditional linguistic theories have attempted to design deterministic rules that would account for all-and-only the sentences of a language that are deemed 'grammatical'. While acknowledging the fact that language use may be variable (graded), conventional theories assume that the underlying human grammar is categorical: a sentence is either grammatical, or it is not. The idea has now gained ground that 'grammaticality' is a graded concept itself, and that human language behaviour is essentially probabilistic in nature. A probabilistic theory of language can take various forms (e.g. the studies presented in Bod et al. 2003), and resembles memory-based and exemplar-based models of language (e.g. Daelemans and van den Bosch 2005, Gahl and Yu 2006).
It is not surprising that many linguists have now moved from studying the dichotomy ('grammatical' and 'ungrammatical'), to studying variation in language. One obvious example of variation is syntactic alternation, in which there are different grammatical constructions that could be used to express the same core semantics. The alternative grammatical constructions are competing, and language users choose (subconsciously) among these options. For instance, speakers of English can choose between the s}-genitive, as in John's dog, and the of-genitive, in the dog of John (e.g. Rosenbach 2003).
One of the best-studied syntactic alternations is the dative alternation in English, in which speakers and writers can choose between structures with a prepositional dative (example 1) or double object structure (example 2):
1) The evil queen gives the poisonous apple to Snow White.
2) The evil queen gives Snow White the poisonous apple.
The dative alternation is also known by many other names, for instance the 'diathesis alternation' and the 'ditransitive construction'. In this thesis, we use the term 'dative alternation'. The two objects of the verb will be referred to as the recipient (Snow White in examples 1 and 2) and the theme (the poisonous apple in the examples).
There are two additional options in the alternation: the reversed prepositional dative construction (e.g. I gave to him a book) and the reversed double object construction (e.g. I gave it him). Also, the alternation can occur with prepositions other than to, e.g. with for (the benefactive alternation, cf. Theijssen et al 2009). These constructions are fairly infrequent, especially compared to the two constructions in examples 1 and 2. In order to prevent data sparseness problems (as those in in Theijssen et al 2009), we therefore limit ourselves to the alternation between the two most frequent options, being the alternation between the double object construction and the prepositional dative construction with to, both in the default object ordering (examples 1 and 2). All mentions of 'prepositional dative' in this thesis thus refer to the variant with to only, unless explicitly indicated otherwise.
Parallels to the dative alternation occur in various languages other than English, for example in Dutch (e.g. Colleman 2006), Greek (e.g. Anagnostopoulou05), Spanish (e.g. Beavers and Nishida 2010) and Brazilian Portuguese (e.g. Gomes 2003). In this thesis, we take the dative alternation in English as a case study, focussing mostly on British English. The set of remaining dative constructions also contains instances with a clausal object (e.g. tell him how nice he is), with a phrasal verb (e.g. to hand over), in passive voice (e.g. He was given a book), in imperative clauses (e.g. Give him the book!), and in interrogative clauses (e.g. Did he give you the book?). These special cases may be influenced by syntactic variation other than the dative alternation, such as passive versus active voice, declarative versus interrogative mode and the placement of particles. One way to take this into account is to control for any other syntactic variation when carrying out the statistical analyses. However, the default syntactic structure is the most frequent by far, which would lead to serious imbalance in the data. For this reason, we follow Bresnan et al. 2007 and exclude all instances with the aforementioned characteristics.
The dative alternation has been the object of study in several subdisciplines of linguistics, e.g. for corpus linguistics (e.g. Bresnan et al. 2007), psycholinguistics (e.g. Bresnan and Ford 2010), first language acquisition (e.g. de Marneffe et al. 2010), second language acquisition (e.g. Babanoglu 2007), sociolinguistics (e.g. Szmrecsanyi 2010) and historical linguistics (e.g. Wolk et al. 2011). Previous research has already found sets of predictive syntactic, semantic, and discourse-related features that appear to influence the likelihood of the two dative constructions. In general, speakers and writers show a tendency to place animate nouns before inanimate nouns, shorter constituents before longer ones, discourse given before discourse new, pronouns before nonpronouns and definite before indefinite.
For the last decade, many researchers in the various subdisciplines have started using multivariate models to study the role of the features suggested in the literature (e.g. Arnold et al. 2000 and Bresnan et al. 2007). Such models, usually (logistic) regression models, allow linguists to study the relevance of the features in one integrated model, instead of studying the influence of the features in isolation. Although the use of these advanced statistical techniques has led to interesting insights in research on syntactic alternations, it also led to some complications. Across the subdisciplines, linguists studying syntactic alternation have to make choices: which features to include in the study (variable selection), how to define and annotate the features used (feature definition), how to obtain an annotated data set that is sufficiently large ((automatic) data collection), how to study the alternation across different speaker groups (comparison of speaker groups) and how to interpret the models found with various techniques (model interpretation). In this thesis, we address the various methodological choices that linguists can make when studying the dative alternation. The research is interdisciplinary: It involves corpus linguistics, psycholinguistics and sociolinguistics.
Promotor: Prof. Dr. Lou Boves
Supervisors: Dr. Nelleke Oostdijk and Dr. Hans van Halteren