GermaNet is supposed to cover the German base vocabulary in its initial version. The base
vocabulary will be determined by statistical analysis of the reference corpus. Since lexicographic
work is done by semantic fields, not by decreasing frequencies of lemmas, it is not always easy
(nor appropriate) to stop encoding words with frequencies below a certain threshold. For this
reason we expect GermaNet to cover more than just the base vocabulary.
In the absence of algorithms for a sound semantic classification of derivates and
semantically transparent compounds these types of lexical items will be treated as ordinary
headwords. Frequency counts from text corpora will serve as a guideline to judge on the inclusion
of particular forms.
The amount of polysemy is kept to a minimum, additional senses are introduced only if
the sense conflicts with the coordinates of other senses of the word in the network. When in
doubt, GermaNet refers to the degree of polysemy given in standard monolingual print dictionaries.
- GermaNet lists lemmas (base forms) only. It is assumed that inflected forms are being
mapped to base forms by some external morphological analyzer (which might be integrated into an
interface to GermaNet).
- Nouns: Ordinary nouns are cited by their nominative singular form. Plurale tantum,
for which no singular form exists, are cited by their nominative plural form, e.g.: Kosten.
For nouns derived from adjectives or verbs the masculine indefinite nominative singular form
is generally used, e.g.: Angestellter, Grüner.
- Verbs are cited by their infinitive form.
- Adjectives are cited without endings for gender.
- Abbreviations are covered if they form part of every day's language and are used in
speech instead of the equivalent full form (e.g.: AIDS, SPD, EDV,
- Multi word expressions are covered if they are commonly used and if they function
as lexical units due to the strong collocational relation between their parts (e.g. Hab und Gut,
Erste Hilfe, instand setzen)
- Concepts referring to human beings and thus indicating natural sexus will be treated as:
- Two separate entries (synsets) if the difference in sexus is lexicalized (Mann/Frau).
- One synset with two entries, listing the masculine and the feminine form (Lehrer, Lehrerin), otherwise.
The new German orthography will be used. Additional citation forms may be listed as variants.
Orth Form: Fantasie, Orth Var: Phantasie
Orth Form: Myrrhe, Orth Var: Myrre
Orth Form: cirka, Orth Var: zirka
In addition the old spelling forms and variants are listed as well:
Old Orth Form: Schiffahrt
Old Orth Form: Fluß
Old Orth Form: Mikrophon, Old Orth Var: Mikrofon
Stylistic variants are marked by a special feature:
schnipsen, stylistic variant: schnippen
Po, stylistic variant: Arsch
arbeiten, stylistic variant: schaffen
Due to limited resources we will only be able to provide a relatively small number of textual definitions for senses in GermaNet. Lexicographers will add definitions when they feel that a particular sense is not adequately defined by its synonyms and/or its immediate neighbor nodes in the network. The definitions are non-formalized textual descriptions of the concepts.
Horizont: Linie, an der sich Himmel und Erde bzw. Meer scheinbar berühren
Example sentences may be given instead of or in addition to free text descriptions.
abbauen(1): Sie haben das Gerüst schon wieder abgebaut.
abbauen(2): Hier wird Kohle abgebaut.
Lexical Gaps/Artificial Concepts
Concepts which do not exist in German, but which are required in order to build a proper hierarchy
are marked as artificial. We refer to such concepts as Lexical Gaps. Example: natürliches_Phänomen.
Note, that attributive adjectives are cited in lower case unless they are lexicalized
(no longer a lexical Gap) as in Erste Hilfe.
Proper names are only covered if they refer to a single non-linguistic item in the real world.
Therefore, geographical names, organizations, etc., (for example Deutschland, Bündnis für Arbeit)
are marked as named entities whereas nationalities are not.