Logo SfSGermaNet

GermaNet Structure

What is the idea behind GermaNet?

GermaNet is being developed to serve as an electronic lexicographic reference database for German word senses. It is primarily intended to serve as a resource for word sense disambiguation which is crucial for natural language applications like information retrieval, the construction of various language technology tools and the annotation of corpora.

 

Resources

In order to meet these limitations, the current work is making heavy use of resources, tools, and experiences made at Princeton University during the design of the WordNet® database. GermaNet is, however, not merely a German version or a translation of WordNet®: it is built entirely from scratch and ten years of experience in building the American model data base is only one of the resources employed to support lexicographer's intuitions in building the net.


  • Thesauri, which are the closest printed media equivalents to GermaNet, do not have a long tradition in German lexicography.

  • An attempt to reuse one of the few German Thesauri, Deutscher Wortschatz by Wehrle-Eggers did not prove to be successful due to its outdated vocabulary and vagueness in stating semantic relations within and between its entries. It is still being used as one source to find semantically related words.

  • Other print dictionaries are Duden 8: Die sinn- und sachverwandten Wörter, and Duden 3: Bildwörterbuch

  • In addition, various general monolingual dictionaries are used to confirm word senses and definitions.

  • But the main resource remains the lexicographic knowledge of the project members in combination with large annotated text corpora which serve as a reference in all cases of doubt.



Similarities and Differences: GermaNet and WordNet®

As already mentioned, GermaNet and WordNet® share the same technology: the data base format and the software code to build and access the data base are the same. Some minor modifications had to be applied to the software to reflect peculiarities of GermaNet and the German language. Unfortunately, the original WordNet® source code does not separate properly between code and language specific data, so that a number of incompatible changes had to be made which now result in two slightly different versions of the same software: one for German and GermaNet and one for English and WordNet®. We intend to build a version of the software that will be able to cope with GermaNet and WordNet® at the same time, but this requires major revisions of the original WordNet® source code.



Division into Lexical Categories

GermaNet shares the basic division of the database into the four lexical categories noun, adjective, verb, and adverb with WordNet®, although it is not planned to implement adverbs in the current work phase.


Connectivity between word classes will be improved in GermaNet by


  • Using existing cross-class relations ('pertains to', 'has/is attribute') more frequently

  • Allowing certain relations to cross word classes (verbs are allowed to 'cause to' adjectives)

  • Adding new cross-class relations (for selectional restrictions)


Division into Semantic Fields (Tops)

For each of the word categories the semantic space is devided into a number of semantic fields. Naturally, the semantic fields are closely related to major nodes in the semantic network, but they do not have to agree with the net's taxonomy, since a lexicographer can always include relations across these fields.


See the table of semantic fields in GermaNet and WordNet® for details.
 


Table 1: Semantic Fields (Tops) in GermaNet and WordNet®


GermaNet WordNet®
Nouns Adjectives Verbs Nouns Adjectives Verbs Adverbs
Tops Allgemein   Allgemein Tops all   all
Artefakt     artifact      
Attribut     attribute      
Besitz   Besitz possession   possession  
  Bewegung       motion  
Relation Relation   relation      
Geschehen     event      
Form     shape      
Gefuehl Gefuehl Gefuehl feeling   emotion  
  Gesellschaft Gesellschaft     social  
Gruppe     group      
Koerper Koerper Koerperfunktion body   body  
Kognition Geist Kognition cognition   cognition  
Kommunikation   Kommunikation communication   communication  
    Konkurrenz     competition  
    Kontakt     contact  
Menge Menge   quantity      
Mensch     person      
Motiv     motive      
Nahrung     food      
natGegenstand     object      
natPhaenomen natPhaenomen natPhaenomen phenomenon   weather  
Ort Ort Lokation location      
  Pertonym     pert    
Pflanze     plant      
    Schoepfung     creation  
Substanz Substanz   substance      
Tier     animal      
    Veraenderung     change  
    Verbrauch     consumption  
  Verhalten          
  Perzeption Perzeption     perception  
Zeit Zeit   time      
      act      
      state   stative  
      process      
        ppl    
  privativ