Encoding Thematic Roles via Syntactic Categories in a German Treebank

George Smith
Universität Potsdam

Abstract

One of the major purposes of annotated corpora is their potential for use as databases for linguistic research. An important design criterion for corpora specifically intended for this use is the need to encode a plurality of types of information, some of which are clearly interrelated. This need can conflict with the need to produce large corpora quickly, where the encoding of redundant information or the encoding of types of information which require difficult decisions on the part of human annotators can incur undesirable costs.

An example of such a conflict would be the desirability of encoding syntactic categories, syntactic functions and thematic roles in the same corpus, so that individual researchers can formulate queries in an intuitive way. In an inflecting language such as German, there is considerable overlap if all three types of information are encoded. For example, the property of being an agent is often claimed to be typical of subjects, which are in the nominative case. On the other hand the relationship is not one-to-one. There are subjects which are clearly not agents and dative objects which have properties typically associated with agents.

The purpose of the paper is to investigate to which extent the thematic roles agent, recipient and patient can be indirectly encoded via syntactic categories. The conception of thematic roles used is taken primarily from Primus (1999). The syntactic categories used are derived from both case and the ability or inability of an argument to undergo object conversion in one of the two main German passive constructions, the werden-passive and the bekommen-passive. It will be argued that prototypical patients are those accusatives which can undergo object conversion in the werden-passive, that prototypical recipients are those datives which can undergo object conversion in the bekommen-passive and that prototypical agents can be identified with the help of the objects with which they co-occur. The question which thematic role a particular argument embodies is often a difficult one for annotators. Judging the grammaticality of a related passive construction is considerably easier. It will be argued that encoding thematic roles indirectly in this way is, for a language such as German, an effective means to give researchers access to a large number of linguistically interesting phenomena.

Literature

Primus, Beatrice (1999). Cases and Thematic Roles. Ergative, Accusative and Active. Tuebingen.


doug@essex.ac.uk