What is the idea behind GermaNet?
GermaNet is being developed to serve as
an electronic lexicographic
reference database for German word senses. It is primarily intended
to serve as a resource for word sense disambiguation which is crucial for
natural language applications like information retrieval, the construction
of various language technology tools and the annotation of corpora.
Resources
In order to meet these limitations, the current work is making heavy use
of resources, tools, and experiences made at Princeton University during
the design of the WordNet® database. GermaNet is, however, not merely
a German version or a translation of WordNet®: it is built entirely
from scratch and ten years of experience in building the American model
data base is only one of the resources employed to support lexicographer's
intuitions in building the net.
- Thesauri, which are the closest printed media equivalents to GermaNet,
do not have a long tradition in German lexicography.
- An attempt to reuse one of the few German Thesauri, Deutscher
Wortschatz by Wehrle-Eggers did not prove to be successful due to its
outdated vocabulary and vagueness in stating semantic relations within
and between its entries. It is still being used as one source to find semantically
related words.
- Other print dictionaries are Duden 8: Die sinn-
und sachverwandten Wörter, and Duden 3:
Bildwörterbuch
- In addition, various general monolingual dictionaries are used to
confirm word senses and definitions.
- But the main resource remains the lexicographic knowledge of the
project members in combination with large annotated text corpora which
serve as a reference in all cases of doubt.
Similarities and Differences: GermaNet
and WordNet®
As already mentioned, GermaNet and WordNet® share the same technology:
the data base format and the software code to build and access the data
base are the same. Some minor modifications had to be applied to the software
to reflect peculiarities of GermaNet and the German language. Unfortunately,
the original WordNet® source code does not separate properly between
code and language specific data, so that a number of incompatible changes
had to be made which now result in two slightly different versions
of the same software: one for German and GermaNet and one for English and
WordNet®. We intend to build a version of the software that will be
able to cope with GermaNet and WordNet® at the same time, but this
requires major revisions of the original WordNet® source code.
Division into Lexical Categories
GermaNet shares the basic division of the database into the four lexical categories
noun, adjective, verb, and adverb with WordNet®, although
it is not planned to implement adverbs in the current work phase.
Connectivity between word classes will be improved in GermaNet by
- Using existing cross-class relations ('pertains to', 'has/is attribute')
more frequently
- Allowing certain relations to cross word classes (verbs are allowed to
'cause to' adjectives)
- Adding new cross-class relations (for selectional restrictions)
Division into Semantic Fields (Tops)
For each of the word categories the semantic space is devided into a number
of semantic fields. Naturally, the
semantic fields are closely related to major nodes in the semantic network,
but they do not have to agree with the net's taxonomy, since a lexicographer
can always include relations across these fields.
See the table of semantic fields in GermaNet
and WordNet® for details.
Table 1: Semantic Fields (Tops) in GermaNet and WordNet®
GermaNet |
|
WordNet® |
Nouns |
Adjectives |
Verbs |
|
Nouns |
Adjectives |
Verbs |
Adverbs |
Tops |
Allgemein |
Allgemein |
|
Tops |
all |
|
all |
Artefakt |
|
|
|
artifact |
|
|
|
Attribut |
|
|
|
attribute |
|
|
|
Besitz |
|
Besitz |
|
possession |
|
possession |
|
|
Bewegung |
|
|
|
|
motion |
|
Relation |
Relation |
|
|
relation |
|
|
|
Geschehen |
|
|
|
event |
|
|
|
Form |
|
|
|
shape |
|
|
|
Gefuehl |
Gefuehl |
Gefuehl |
|
feeling |
|
emotion |
|
|
Gesellschaft |
Gesellschaft |
|
|
|
social |
|
Gruppe |
|
|
|
group |
|
|
|
Koerper |
Koerper |
Koerperfunktion |
|
body |
|
body |
|
Kognition |
Geist |
Kognition |
|
cognition |
|
cognition |
|
Kommunikation |
|
Kommunikation |
|
communication |
|
communication |
|
|
|
Konkurrenz |
|
|
|
competition |
|
|
|
Kontakt |
|
|
|
contact |
|
Menge |
Menge |
|
|
quantity |
|
|
|
Mensch |
|
|
|
person |
|
|
|
Motiv |
|
|
|
motive |
|
|
|
Nahrung |
|
|
|
food |
|
|
|
natGegenstand |
|
|
|
object |
|
|
|
natPhaenomen |
natPhaenomen |
natPhaenomen |
|
phenomenon |
|
weather |
|
Ort |
Ort |
Lokation |
|
location |
|
|
|
|
Pertonym |
|
|
|
pert |
|
|
Pflanze |
|
|
|
plant |
|
|
|
|
|
Schoepfung |
|
|
|
creation |
|
Substanz |
Substanz |
|
|
substance |
|
|
|
Tier |
|
|
|
animal |
|
|
|
|
|
Veraenderung |
|
|
|
change |
|
|
|
Verbrauch |
|
|
|
consumption |
|
|
Verhalten |
|
|
|
|
|
|
|
Perzeption |
Perzeption |
|
|
|
perception |
|
Zeit |
Zeit |
|
|
time |
|
|
|
|
|
|
|
act |
|
|
|
|
|
|
|
state |
|
stative |
|
|
|
|
|
process |
|
|
|
|
|
|
|
|
ppl |
|
|
|
privativ |
|
|
|
|
|