Logo SfSGermaNet

Sense-Annotated Corpora

Sense-Annotated TüBa-D/Z Treebank

The TüBa-D/Z treebank is a syntactically annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung' (taz). The treebank has been manually annotated with senses from GermaNet with the goal of providing a gold standard for word sense disambiguation. The sense annotations are freely available as part of release 9.1 of the treebank.


You can find more information on the sense annotations on the following website:

http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/sense-annotated-tueba-dz.html

To obtain the treebank data (including the sense annotations), please follow the steps described at:

http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html


Sense-Annotated WebCAGe

WebCAGe (short for: Web-Harvested Corpus Annotated with GermaNet Senses) is a domain-independent web-harvested corpus that has been semi-automatically annotated with senses from GermaNet. In order to assure good quality, all automatic annotations have been manually verified.


You can download WebCAGe from the following website:

http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/webcage.html