Logo SfSGermaNet

GermaNet Introduction

Coverage

GermaNet is supposed to cover the German base vocabulary in its initial version. The base vocabulary will be determined by statistical analysis of the reference corpus. Since lexicographic work is done by semantic fields, not by decreasing frequencies of lemmas, it is not always easy (nor appropriate) to stop encoding words with frequencies below a certain threshold. For this reason we expect GermaNet to cover more than just the base vocabulary.


In the absence of algorithms for a sound semantic classification of derivates and semantically transparent compounds these types of lexical items will be treated as ordinary headwords. Frequency counts from text corpora will serve as a guideline to judge on the inclusion of particular forms.


The amount of polysemy is kept to a minimum, additional senses are introduced only if the sense conflicts with the coordinates of other senses of the word in the network. When in doubt, GermaNet refers to the degree of polysemy given in standard monolingual print dictionaries.



Notation Conventions:

  • GermaNet lists lemmas (base forms) only. It is assumed that inflected forms are being mapped to base forms by some external morphological analyzer (which might be integrated into an interface to GermaNet).

  • Nouns: Ordinary nouns are cited by their nominative singular form. Plurale tantum, for which no singular form exists, are cited by their nominative plural form, e.g.: Kosten.

    For nouns derived from adjectives or verbs the masculine indefinite nominative singular form is generally used, e.g.: Angestellter, Grüner.

  • Verbs are cited by their infinitive form.

  • Adjectives are cited without endings for gender.

  • Abbreviations are covered if they form part of every day's language and are used in speech instead of the equivalent full form (e.g.: AIDS, SPD, EDV, LSD,etc.).

  • Multi word expressions are covered if they are commonly used and if they function as lexical units due to the strong collocational relation between their parts (e.g. Hab und Gut, Erste Hilfe, instand setzen)

  • Concepts referring to human beings and thus indicating natural sexus will be treated as:

        
    1. Two separate entries (synsets) if the difference in sexus is lexicalized (Mann/Frau).
    2. One synset with two entries, listing the masculine and the feminine form (Lehrer, Lehrerin), otherwise.


Orthography

The new German orthography will be used. Additional citation forms may be listed as variants.


Orth Form: Fantasie, Orth Var: Phantasie
Orth Form: Myrrhe, Orth Var: Myrre
Orth Form: cirka, Orth Var: zirka


In addition the old spelling forms and variants are listed as well:


Old Orth Form: Schiffahrt
Old Orth Form: Fluß
Old Orth Form: Mikrophon, Old Orth Var: Mikrofon



Style Marking

Stylistic variants are marked by a special feature:

schnipsen, stylistic variant: schnippen
Po, stylistic variant: Arsch
arbeiten, stylistic variant: schaffen



Definitions (Paraphrases)

Due to limited resources we will only be able to provide a relatively small number of textual definitions for senses in GermaNet. Lexicographers will add definitions when they feel that a particular sense is not adequately defined by its synonyms and/or its immediate neighbor nodes in the network. The definitions are non-formalized textual descriptions of the concepts.


Horizont: Linie, an der sich Himmel und Erde bzw. Meer scheinbar berühren


Example sentences may be given instead of or in addition to free text descriptions.


abbauen(1): Sie haben das Gerüst schon wieder abgebaut.
abbauen(2): Hier wird Kohle abgebaut.



Lexical Gaps/Artificial Concepts

Concepts which do not exist in German, but which are required in order to build a proper hierarchy are marked as artificial. We refer to such concepts as Lexical Gaps. Example: natürliches_Phänomen.


Note, that attributive adjectives are cited in lower case unless they are lexicalized (no longer a lexical Gap) as in Erste Hilfe.



Named Entities

Proper names are only covered if they refer to a single non-linguistic item in the real world. Therefore, geographical names, organizations, etc., (for example Deutschland, Bündnis für Arbeit) are marked as named entities whereas nationalities are not.