germanet
Class GermaNet

java.lang.Object
  extended by germanet.GermaNet

public class GermaNet
extends java.lang.Object

Provides high-level look-up access to GermaNet data. Intended as a read-only resource - no public methods are provided for changing or adding data.

GermaNet is a collection of German lexical units (LexUnits) organized into sets of synonyms (Synsets).
A Synset has a WordCategory (adj, nomen, verben) and consists of a paraphrase and Lists of LexUnits. The List of LexUnits is never empty.
A LexUnit consists of an orthForm (represented as a Strings), an orthVar (can be empty), an oldOrthForm (can be empty), and an oldOrthVar (can be empty). Examples and Frames can belong to a LexUnit as well as the following attributes: styleMarking (boolean), sense (int), styleMarking (boolean), artificial (boolean), namedEntity (boolean), and source (String).
A Frame is simply a container for frame data (String).
An Example consists of text (String) and zero or one Frame(s).

To construct a GermaNet object, provide the location of the GermaNet data and (optionally) a flag indicating whether searches should be done ignoring case. This data location can be set with a String representing the path to the directory containing the data, or with a File object. If no flag is used, then case-sensitive searching will be performed:

// Use case-sensitive searching
GermaNet gnet = new GermaNet("/home/myName/germanet/GN_V60");
or
// Ignore case when searching
File gnetDir = new File("/home/myName/germanet/GN_V60");
GermaNet gnet = new GermaNet(gnetDir, true);

The GermaNet class has methods that return Lists of Synsets or LexUnits, given an orthForm or a WordCategory. For example,

List<LexUnit> lexList = gnet.getLexUnits("Bank");
List<LexUnit>> verbenLU = gnet.getLexUnits(WordCategory.verben);
List<Synset> synList = gnet.getSynsets("gehen");
List<Synset> adjSynsets = gnet.getSynsets(WordCategory.adj);

Unless otherwise stated, methods will return an empty List rather than null to indicate that no objects exist for the given request.

Important Note:
Loading GermaNet requires more memory than the JVM allocates by default. Any application that loads GermaNet will most likely need to be run with JVM options that increase the memory allocated, like this:

java -Xms128m -Xmx128m MyApplication

Depending on the memory needs of the application itself, the 128's may need to be changed to 256's or higher.


Field Summary
static java.lang.String NO
           
static java.lang.String XML_ARTIFICIAL
           
static java.lang.String XML_CON_REL
           
static java.lang.String XML_EXAMPLE
           
static java.lang.String XML_EXFRAME
           
static java.lang.String XML_FRAME
           
static java.lang.String XML_ID
           
static java.lang.String XML_LEX_REL
           
static java.lang.String XML_LEX_UNIT
           
static java.lang.String XML_NAMED_ENTITY
           
static java.lang.String XML_OLD_ORTH_FORM
           
static java.lang.String XML_OLD_ORTH_VAR
           
static java.lang.String XML_ORTH_FORM
           
static java.lang.String XML_ORTH_VAR
           
static java.lang.String XML_PARAPHRASE
           
static java.lang.String XML_RELATION
           
static java.lang.String XML_RELATION_DIR
           
static java.lang.String XML_RELATION_FROM
           
static java.lang.String XML_RELATION_INV
           
static java.lang.String XML_RELATION_NAME
           
static java.lang.String XML_RELATION_TO
           
static java.lang.String XML_RELATIONS
           
static java.lang.String XML_SENSE
           
static java.lang.String XML_SOURCE
           
static java.lang.String XML_STYLE_MARKING
           
static java.lang.String XML_SYNSET
           
static java.lang.String XML_SYNSETS
           
static java.lang.String XML_TEXT
           
static java.lang.String XML_WORD_CATEGORY
           
static java.lang.String YES
           
 
Constructor Summary
GermaNet(java.io.File dir)
          Constructs a new GermaNet object by loading the the data files in the specified directory File - searches are case sensitive.
GermaNet(java.io.File dir, boolean ignoreCase)
          Constructs a new GermaNet object by loading the the data files in the specified directory File.
GermaNet(java.lang.String dirName)
          Constructs a new GermaNet object by loading the the data files in the specified directory path name - searches are case sensitive.
GermaNet(java.lang.String dirName, boolean ignoreCase)
          Constructs a new GermaNet object by loading the the data files in the specified directory path name.
 
Method Summary
protected  void addSynset(Synset synset)
          Adds a Synset to this GermaNet's Synset list.
 java.lang.String getDir()
          Gets the absolute path name of the directory where the GermaNet data files are stored.
 LexUnit getLexUnitByID(int id)
          Returns the LexUnit with id, or null if it is not found.
 java.util.List<LexUnit> getLexUnits()
          Returns a List of all LexUnits.
 java.util.List<LexUnit> getLexUnits(java.lang.String orthForm)
          Returns a List of all LexUnits in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant.
 java.util.List<LexUnit> getLexUnits(java.lang.String orthForm, boolean considerMainOrthFormOnly)
          Returns a List of all LexUnits in which orthForm occurs as main orthographical form -- in case considerAllOrthForms is true.
 java.util.List<LexUnit> getLexUnits(java.lang.String orthForm, WordCategory wordCategory)
          Returns a List of all LexUnits with the specified WordCategory in which orthForm occurs as as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant.
 java.util.List<LexUnit> getLexUnits(java.lang.String orthForm, WordCategory wordCategory, boolean considerMainOrthFormOnly)
          Returns a List of all LexUnits with the specified WordCategory in which orthForm occurs as main orthographical form -- in case considerAllOrthForms is true.
 java.util.List<LexUnit> getLexUnits(WordCategory wordCategory)
          Returns a List of all LexUnits in the specified wordCategory.
 Synset getSynsetByID(int id)
          Returns the Synset with id, or null if it is not found.
 java.util.List<Synset> getSynsets()
          Returns a List of all Synsets.
 java.util.List<Synset> getSynsets(java.lang.String orthForm)
          Returns a List of all Synsets in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant in one of its LexUnits, using the ignoreCase flag as set in the constructor.
 java.util.List<Synset> getSynsets(java.lang.String orthForm, boolean considerMainOrthFormOnly)
          Returns a List of all Synsets in which orthForm occurs as main orthographical form in one of its LexUnits -- in case considerAllOrthForms is true.
 java.util.List<Synset> getSynsets(java.lang.String orthForm, WordCategory wordCategory)
          Returns a List of all Synsets with the specified WordCategory in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant in one of its LexUnits.
 java.util.List<Synset> getSynsets(java.lang.String orthForm, WordCategory wordCategory, boolean considerMainOrthFormOnly)
          Returns a List of all Synsets with the specified WordCategory in which orthForm occurs as main orthographical form in one of its LexUnits -- in case considerAllOrthForms is true.
 java.util.List<Synset> getSynsets(WordCategory wordCategory)
          Returns a List of all Synsets in the specified wordCategory.
 int numLexUnits()
          Returns the number of LexUnits contained in GermaNet.
 int numSynsets()
          Returns the number of Synsets contained in GermaNet.
protected  void trimAll()
          Trims all Lists (takes ~0.3 seconds and frees up 2mb).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

XML_SYNSETS

public static final java.lang.String XML_SYNSETS
See Also:
Constant Field Values

XML_SYNSET

public static final java.lang.String XML_SYNSET
See Also:
Constant Field Values

XML_ID

public static final java.lang.String XML_ID
See Also:
Constant Field Values

XML_PARAPHRASE

public static final java.lang.String XML_PARAPHRASE
See Also:
Constant Field Values

XML_WORD_CATEGORY

public static final java.lang.String XML_WORD_CATEGORY
See Also:
Constant Field Values

XML_LEX_UNIT

public static final java.lang.String XML_LEX_UNIT
See Also:
Constant Field Values

XML_ORTH_FORM

public static final java.lang.String XML_ORTH_FORM
See Also:
Constant Field Values

XML_ORTH_VAR

public static final java.lang.String XML_ORTH_VAR
See Also:
Constant Field Values

XML_OLD_ORTH_FORM

public static final java.lang.String XML_OLD_ORTH_FORM
See Also:
Constant Field Values

XML_OLD_ORTH_VAR

public static final java.lang.String XML_OLD_ORTH_VAR
See Also:
Constant Field Values

XML_SOURCE

public static final java.lang.String XML_SOURCE
See Also:
Constant Field Values

XML_SENSE

public static final java.lang.String XML_SENSE
See Also:
Constant Field Values

XML_STYLE_MARKING

public static final java.lang.String XML_STYLE_MARKING
See Also:
Constant Field Values

XML_NAMED_ENTITY

public static final java.lang.String XML_NAMED_ENTITY
See Also:
Constant Field Values

XML_ARTIFICIAL

public static final java.lang.String XML_ARTIFICIAL
See Also:
Constant Field Values

XML_EXAMPLE

public static final java.lang.String XML_EXAMPLE
See Also:
Constant Field Values

XML_TEXT

public static final java.lang.String XML_TEXT
See Also:
Constant Field Values

XML_EXFRAME

public static final java.lang.String XML_EXFRAME
See Also:
Constant Field Values

XML_FRAME

public static final java.lang.String XML_FRAME
See Also:
Constant Field Values

XML_RELATIONS

public static final java.lang.String XML_RELATIONS
See Also:
Constant Field Values

XML_RELATION

public static final java.lang.String XML_RELATION
See Also:
Constant Field Values

XML_CON_REL

public static final java.lang.String XML_CON_REL
See Also:
Constant Field Values

XML_LEX_REL

public static final java.lang.String XML_LEX_REL
See Also:
Constant Field Values

XML_RELATION_NAME

public static final java.lang.String XML_RELATION_NAME
See Also:
Constant Field Values

XML_RELATION_DIR

public static final java.lang.String XML_RELATION_DIR
See Also:
Constant Field Values

XML_RELATION_INV

public static final java.lang.String XML_RELATION_INV
See Also:
Constant Field Values

XML_RELATION_TO

public static final java.lang.String XML_RELATION_TO
See Also:
Constant Field Values

XML_RELATION_FROM

public static final java.lang.String XML_RELATION_FROM
See Also:
Constant Field Values

YES

public static final java.lang.String YES
See Also:
Constant Field Values

NO

public static final java.lang.String NO
See Also:
Constant Field Values
Constructor Detail

GermaNet

public GermaNet(java.lang.String dirName)
         throws java.io.FileNotFoundException,
                javax.xml.stream.XMLStreamException
Constructs a new GermaNet object by loading the the data files in the specified directory path name - searches are case sensitive.

Parameters:
dirName - the directory where the GermaNet data files are located
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

GermaNet

public GermaNet(java.lang.String dirName,
                boolean ignoreCase)
         throws java.io.FileNotFoundException,
                javax.xml.stream.XMLStreamException
Constructs a new GermaNet object by loading the the data files in the specified directory path name.

Parameters:
dirName - the directory where the GermaNet data files are located
ignoreCase - if true ignore case on lookups, otherwise do case sensitive searches
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

GermaNet

public GermaNet(java.io.File dir)
         throws java.io.FileNotFoundException,
                javax.xml.stream.XMLStreamException
Constructs a new GermaNet object by loading the the data files in the specified directory File - searches are case sensitive.

Parameters:
dir - location of the GermaNet data files
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

GermaNet

public GermaNet(java.io.File dir,
                boolean ignoreCase)
         throws java.io.FileNotFoundException,
                javax.xml.stream.XMLStreamException
Constructs a new GermaNet object by loading the the data files in the specified directory File.

Parameters:
dir - location of the GermaNet data files
ignoreCase - if true ignore case on lookups, otherwise do case sensitive searches
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException
Method Detail

getDir

public java.lang.String getDir()
Gets the absolute path name of the directory where the GermaNet data files are stored.

Returns:
the absolute pathname of the location of the GermaNet data files

addSynset

protected void addSynset(Synset synset)
Adds a Synset to this GermaNet's Synset list.

Parameters:
synset - the Synset to add

getSynsets

public java.util.List<Synset> getSynsets()
Returns a List of all Synsets.

Returns:
a list of all Synsets

getSynsets

public java.util.List<Synset> getSynsets(java.lang.String orthForm)
Returns a List of all Synsets in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant in one of its LexUnits, using the ignoreCase flag as set in the constructor. Same than calling getSynsets(orthForm, false) with considerMainOrthFormOnly=false.

Parameters:
orthForm - the orthForm to search for
Returns:
a List of all Synsets containing orthForm. If no Synsets were found, this is a List containing no Synsets

getSynsets

public java.util.List<Synset> getSynsets(java.lang.String orthForm,
                                         boolean considerMainOrthFormOnly)
Returns a List of all Synsets in which orthForm occurs as main orthographical form in one of its LexUnits -- in case considerAllOrthForms is true. Else returns a List of all Synsets in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant in one of its LexUnits -- in case considerAllOrthForms is false. It uses the ignoreCase flag as set in the constructor.

Parameters:
orthForm - the orthForm to search for
considerMainOrthFormOnly - considering main orthographical form only (true) or all variants (false)
Returns:
a List of all Synsets containing orthForm. If no Synsets were found, this is a List containing no Synsets

getSynsets

public java.util.List<Synset> getSynsets(java.lang.String orthForm,
                                         WordCategory wordCategory)
Returns a List of all Synsets with the specified WordCategory in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant in one of its LexUnits. It uses the ignoreCase flag as set in the constructor. Same than calling getSynsets(orthForm, wordCategory, false) with considerMainOrthFormOnly=false.

Parameters:
orthForm - the orthForm to be found
wordCategory - the WordCategory of the Synsets to be found (e.g. WordCategory.adj)
Returns:
a List of Synsets with the specified orthForm and wordCategory.

getSynsets

public java.util.List<Synset> getSynsets(java.lang.String orthForm,
                                         WordCategory wordCategory,
                                         boolean considerMainOrthFormOnly)
Returns a List of all Synsets with the specified WordCategory in which orthForm occurs as main orthographical form in one of its LexUnits -- in case considerAllOrthForms is true. Else returns a List of all Synsets in which orthForm occurs as as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant in one of its LexUnits -- in case considerAllOrthForms is false. It uses the ignoreCase flag as set in the constructor.

Parameters:
orthForm - the orthForm to be found
wordCategory - the WordCategory of the Synsets to be found (e.g. WordCategory.adj)
considerMainOrthFormOnly - considering main orthographical form only (true) or all variants (false)
Returns:
a List of Synsets with the specified orthForm and wordCategory.

getSynsets

public java.util.List<Synset> getSynsets(WordCategory wordCategory)
Returns a List of all Synsets in the specified wordCategory.

Parameters:
wordCategory - the WordCategory, for example WordCategory.nomen
Returns:
a List of all Synsets in the specified wordCategory. If no Synsets were found, this is a List containing no Synsets.

getSynsetByID

public Synset getSynsetByID(int id)
Returns the Synset with id, or null if it is not found.

Parameters:
id - the ID of the Synset to be found.
Returns:
the Synset with id, or null if it is not found..

getLexUnitByID

public LexUnit getLexUnitByID(int id)
Returns the LexUnit with id, or null if it is not found.

Parameters:
id - the ID of the LexUnit to be found
Returns:
the LexUnit with id, or null if it is not found.

numSynsets

public int numSynsets()
Returns the number of Synsets contained in GermaNet.

Returns:
the number of Synsets contained in GermaNet

numLexUnits

public int numLexUnits()
Returns the number of LexUnits contained in GermaNet.

Returns:
the number of LexUnits contained in GermaNet

getLexUnits

public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm)
Returns a List of all LexUnits in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant. It uses the ignoreCase flag as set in the constructor. Same than calling getSynsets(orthForm, false) with considerMainOrthFormOnly=false.

Parameters:
orthForm - the orthForm to search for
Returns:
a List of all LexUnits containing orthForm. If no LexUnits were found, this is a List containing no LexUnits.

getLexUnits

public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm,
                                           boolean considerMainOrthFormOnly)
Returns a List of all LexUnits in which orthForm occurs as main orthographical form -- in case considerAllOrthForms is true. Else returns a List of all LexUnits in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant -- in case considerAllOrthForms is false. It uses the ignoreCase flag as set in the constructor.

Parameters:
orthForm - the orthForm to search for
considerMainOrthFormOnly - considering main orthographical form only (true) or all variants (false)
Returns:
a List of all LexUnits containing orthForm. If no LexUnits were found, this is a List containing no LexUnits.

getLexUnits

public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm,
                                           WordCategory wordCategory)
Returns a List of all LexUnits with the specified WordCategory in which orthForm occurs as as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant. It uses the ignoreCase flag as set in the constructor. Same than calling getSynsets(orthForm, wordCategory, false) with considerMainOrthFormOnly=false.

Parameters:
orthForm - the orthForm to be found
wordCategory - the WordCategory of the LexUnits to be found (eg WordCategory.nomen)
Returns:
a List of LexUnits with the specified orthForm and wordCategory.

getLexUnits

public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm,
                                           WordCategory wordCategory,
                                           boolean considerMainOrthFormOnly)
Returns a List of all LexUnits with the specified WordCategory in which orthForm occurs as main orthographical form -- in case considerAllOrthForms is true. Else returns a List of all LexUnits in which orthForm occurs as main orthographical form, as orthographical variant, as old orthographical form, or as old orthographic variant -- in case considerAllOrthForms is false. It uses the ignoreCase flag as set in the constructor.

Parameters:
orthForm - the orthForm to be found
wordCategory - the WordCategory of the LexUnits to be found (eg WordCategory.nomen)
considerMainOrthFormOnly - considering main orthographical form only (true) or all variants (false)
Returns:
a List of LexUnits with the specified orthForm and wordCategory.

getLexUnits

public java.util.List<LexUnit> getLexUnits(WordCategory wordCategory)
Returns a List of all LexUnits in the specified wordCategory.

Parameters:
wordCategory - the WordCategory, (e.g. WordCategory.verben)
Returns:
a List of all LexUnits in the specified wordCategory. If no LexUnits were found, this is a List containing no LexUnits.

getLexUnits

public java.util.List<LexUnit> getLexUnits()
Returns a List of all LexUnits.

Returns:
a List of all LexUnits

trimAll

protected void trimAll()
Trims all Lists (takes ~0.3 seconds and frees up 2mb).