The Treebank TüBa-J/S
The TüBa-J/S treebank was annotated in the project Verbmobil . Verbmobil was a longterm Machine Translation project for spontaneous speech funded by the Ministry for Education, Science, Research, and Technology (BMBF).
The Tübingen Treebank of Spoken Japanese, TüBa-J/S, is a syntactically annotated corpus based on spontaneous dialogues, which were manually transliterated. The treebank comprises approximately 18.000 sentences (ca. 160.000 words). The syntactic annotation was performed
manually.
The syntactic annotation is HPSG-oriented. The annotation scheme distinguishes three levels of syntactic constituency: the lexical level, the phrasal level, and the clausal level. In addition to constituent structure, annotated trees contain edge labels between nodes. These edge labels encode grammatical functions (as relation between phrases) and the distinction between heads and non-heads (as phrase-internal relations).
An extensive description of the complete annotation scheme can be found in the stylebook: .ps (ca. 2.5 MB) .pdf (ca. 0.8 MB) .ps.bz2 (ca. 0.2 MB) .pdf.bz2 (ca. 0.5 MB)
Annotations were used as training data at the
CoNLL-X Shared Task: Multi-lingual Dependency Parsing in 2006 and are included in the normal treebank license.
The treebank is available in two different formats:
The negra export format can be used in combination with the annotation tool Annotate which was developed in the Project negra at the Computational Linguistics Department at the University of the Saarland or with the TIGER Search Tool developed in the TIGER project.
How to Obtain a License for TüBa-J/S:
For academic research, the license is provided free of charge.
For all other uses please contact
for further details.
For an academic research license, follow these steps:
- Print the License Agreement:
- Fill out the license agreement for TüBa-J/S and send the license agreement via letter mail, fax or scan to
.
- After processing the license, we will send you a password for the download webpage.
- Download the TüBa-J/S.
Contact:
Eberhard Karls Universität Tübingen
Department of Computational Linguistics
Wilhelmstr. 19
D-72074 Tübingen
Germany
Tel: +49 - 7071 - 29 73970
Fax: +49 - 7071 - 29 5214