Content
Overview
Composition is a very productive word formation process in German. For many
applications, it is helpful to have information about the parts of the compound,
as usually the semantic interpretation is based on the meaning of its parts.
In GermaNet, nominal compounds are therefore split into their constituent
parts, i.e., modifier and head. This splitting identifies the immediate constituents
at each level of analysis and thus reflects the recursive nature of
compounds that have more than two constituent parts
such as Autobahnanschlussstelle (‘motorway junction’). The immediate
constituents of this compound are Autobahn and Anschlussstelle,
with the first
constituent then splitting further into Auto and Bahn and the
second constituent further split into Anschluss and Stelle
(see Figure 1).
Figure 1: Split compound
What makes compound splitting for German a challenging task is the fact that
compounding is not always simple string concatenation, but often involves the
presence of intervening linking elements or the elision of word-final characters
in the modifier constituent of a compound
(Henrich & Hinrichs, 2011).
In GermaNet, all modifiers are lemmatized and if a modifier is ambiguous
with respect to its word class (due to conversion), both possibilities
are specified:
- Laufschuhe: lauf- (en) [verb] and (der) Lauf [noun]
- Baustelle: bau- (en) [verb] and (der) Bau [noun]
Compound splitting in GermaNet is supported by an automatic algorithm,
which combines several individual compound splitters. Please see the
referenced paper below for more information on the
automatic splitting. All automatically split compounds are manually post-corrected
and enriched with relevant properties before they are inserted into GermaNet.
Properties
The following properties are specified for modifiers and/or heads:
Abbreviation
If one part of the compound is an abbreviation, it is labelled as Abkürzung.
Examples:
Compound |
Modifier |
Head |
SIM-Karte |
SIM (abbreviation) |
Karte |
ISO-Norm |
ISO (abbreviation) |
Norm |
Bonus-CD |
Bonus |
CD (abbreviation) |
Affixoid
Affixoids are morphemes with a special status between bound and free morphemes. As they have a clearly assigned meaning, it makes sense to split the
respective words. The bound morpheme is labelled as Affixoid.
Examples:
Compound |
Modifier |
Head |
Grundfrage |
grund (affixoid) |
Frage |
Grundrecht |
grund (affixoid) |
Recht |
Hauptsaison |
haupt (affixoid) |
Saison |
Hauptstadt |
haupt (affixoid) |
Stadt |
Foreign Word
If one part (or more) of the compound is not a German word, it is labelled as Fremdwort. Note that those constituents which are borrowed words but
are nowadays used as loanwords defined in a standard German dictionary (such as Duden) are not considered as foreign words in GermaNet (e.g. Drink
and Song in the examples below).
Examples:
Compound |
Modifier |
Head |
Energydrink |
Energy (foreign word) |
Drink |
Coversong |
Cover (foreign word) |
Song |
Logdatei |
Log (foreign word) |
Datei |
Konfix
The label Konfix refers to a word which is borrowed from a foreign language, in many cases from Latin or Greek, and whose
meaning stems from that particular language. Konfixes are bound morphemes, but in opposition to all other affixes two Konfixes
can be combined to form a so-called Konfixkompositum. Those Konfixkomposita are not split in GermaNet, whereas compounds existing
of a Konfix and a native word are split.
Examples:
Compound |
Modifier |
Head |
Milligramm |
milli (Konfix) |
Gramm |
Zentimeter |
zenti (Konfix) |
Meter |
Monokultur |
mono (Konfix) |
Kultur |
Opaque Morpheme
Modifiers whose meaning is not transparent any more without considering the etymology of the word are labelled with the property opaques Morphem.
Examples:
Compound |
Modifier |
Head |
Himbeere |
Him (opaque morpheme) |
Beere |
Karfreitag |
Kar (opaque morpheme) |
Freitag |
Sintflut |
Sint (opaque morpheme) |
Flut |
Lebkuchen |
Leb (opaque morpheme) |
Kuchen |
Schwiegermutter |
Schwieger (opaque morpheme) |
Mutter |
Proper Name
If the whole compound is a named entity, it is not split in GermaNet. If only the modifier is a proper name, the compound is split
and the label Eigenname is added to the modifier.
Examples:
Compound |
Modifier |
Head |
Hubbleteleskop |
Hubble (proper name) |
Teleskop |
Wertherstimmung |
Werther (proper name) |
Stimmung |
Hiobsbotschaft |
Hiob (proper name) |
Botschaft |
Virtual Word Form
Virtual word forms, labelled as Virtuelle Bildung, are regularly built according to existing word formation rules. However, they do not exist
in isolation, but only as part of a compound.
Examples:
Compound |
Modifier |
Head |
Einflussnahme |
Einfluss |
Nahme (virtual word form) |
Fragesteller |
Frage |
Steller (virtual word form) |
Farbgebung |
Farbe |
Gebung (virtual word form) |
Word Group
Modifiers consisting of a phrase are marked as Wortgruppe and the parts of the phrase are annotated as the modifier.
Examples:
Compound |
Modifier |
Head |
Dreiwege-Katalysator |
drei Wege (word group) |
Katalysator |
Nacht-und-Nebel-Aktion |
Nacht und Nebel (word group) |
Aktion |
Pro-Kopf-Einkommen |
pro Kopf (word group) |
Einkommen |
The following table gives an overview of the constituent parts of a compound (i.e. modifier and head) and the corresponding properties that are annotated for each constituent in GermaNet:
Property |
Modifier |
Head |
Abbreviation |
x |
x |
Affixoid |
x |
x |
Foreign Word |
x |
x |
Konfix |
x |
x |
Opaque Morpheme |
x |
x |
Proper Name |
x |
|
Virtual Word Form |
|
x |
Word Group |
x |
|
Download
In addition to the information described above that is included in GermaNet
(since release 8.0), a list of split compounds with their modifier(s) and head
is freely available for download here:
The list of compound data is free for academic research as defined in
GermaNet's academic research licence agreement.
For any other intended purposes, please contact us.
The format of these split compounds is one compound per line:
first the compound itself, then a <tab> space, then the modifier (in case
of two modifiers, these are separated by the pipe (|) symbol), then a <tab>
space again, and finally the head. For example:
Apfelbaum Apfel Baum
Himbeere Him Beere
Goldmünze Gold Münze
Badetuch baden|Bad Tuch
Laufband laufen|Lauf Band
Reference
The following paper describes the automatic compound splitting that is
performed before the manual post-correction. If you want to use the split compounds
in the context of scientific or research work, please refer to the paper:
Verena Henrich and Erhard Hinrichs:
Determining Immediate Constituents of Compounds in GermaNet.
In Proceedings of Recent Advances in Natural Language Processing (RANLP 2011),
Hissar, Bulgaria, September 2011, pp. 420-426.
[Download paper:
http://www.aclweb.org/anthology/R11-1058]