Integrating Multiple Levels of Linguistic Annotation

Tylman Ule and Frank Müller
University of Tübingen
ule,fhm@sfs.nphil.uni-tuebingen.de

Abstract

This paper describes a corpus annotation framework integrating multiple tools for linguistic annotation at several levels of linguistic analysis. The tools are wrapped into separate modules interacting via a common data format, namely XML. Each module is built upon freely available state-of-the-art tools customised for annotating unrestricted text. The system will be used to produce a very large German corpus with chunks and named entities marked up, and tokens analysed morphologically.


doug@essex.ac.uk