Although discourse analysis is considered useful for many applications in the field of language technology, automatic discourse parsing is still problematic. A widely accepted model for discourse analysis is Rhetorical Structure Theory, developed by Mann and Thompson (1988). Soricut and Marcu (2003) have developed the discourse parser SPADE, which detects RST-relations between Elementary Discourse Units (EDUs) within a sentence. An automatic discourse parser that is able to find rhetorical relations at higher levels in the text is not yet available.
Our research focusses on the rhetorical relations between (Multi-) Sentential Discource Units ((M-) SDU) - text spans consisting of one or more sentences - within the same paragraph. The goal of our research is to establish what information is useful in detecting these relations. We therefore simplified the task of discourse parsing to a decision problem in which we decide whether an (M-) SDU is rethorically related to either a preceding or a following (M-) SDU. Employing the RST Corpus (Carlson et al. 2003), we offer this choice to machine learning algorithms together with syntacic, lexical, referential and surface features in order to determine which features are most useful.
The presentation will illustrate our method and the information features applied and will present our conclusions on the benefit of these features for automatic discourse parsing of paragraphs.
Presented at: The 18th meeting of Computational Linguistics In the Netherlands (CLIN-18), 7 December 2007, Radboud University Nijmegen, Nijmegen, the Netherlands.
Slides (pdf; 81kB)
back to presentations and posters