public class GermaNet
extends java.lang.Object
LexUnits
)
organized into sets of synonyms (Synsets
).Synset
has a
WordCategory
(adj, nomen, verben) and consists of a paraphrase
and Lists of LexUnit
s. The List of LexUnit
s is
never empty.LexUnit
consists of an orthForm (represented as a Strings),
an orthVar (can be empty), an oldOrthForm (can be empty), and an oldOrthVar
(can be empty). Examples
, Frames
, IliRecords
,
and WiktionaryParaphrases
can belong to a
LexUnit
as well as the following
attributes: styleMarking (boolean), sense (int), styleMarking (boolean),
artificial (boolean), namedEntity (boolean), and source (String).Frame
is simply a container for frame data (String).Example
consists of text (String) and zero or one
Frame
(s).GermaNet
object, provide the location of the
GermaNet data and (optionally) a flag indicating whether searches should be
done ignoring case. This data location can be set with a String
representing the path to the directory containing the data, or with a
File
object. If no flag is used, then case-sensitive
searching will be performed:
// Use case-sensitive searching
GermaNet gnet = new GermaNet("/home/myName/germanet/GN_V60");
or
// Ignore case when searching
File gnetDir = new File("/home/myName/germanet/GN_V60");
GermaNet gnet = new GermaNet(gnetDir, true);
The GermaNet
class has methods that return Lists
of
Synsets
or LexUnits
, given
an orthForm or a WordCategory. For example,
List<LexUnit> lexList = gnet.getLexUnits("Bank");
List<LexUnit> verbenLU = gnet.getLexUnits(WordCategory.verben);
List<Synset> synList = gnet.getSynsets("gehen");
List<Synset> adjSynsets = gnet.getSynsets(WordCategory.adj);
Unless otherwise stated, methods will return an empty List rather than null
to indicate that no objects exist for the given request. java -Xms128m -Xmx128m MyApplication
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
NO |
static java.lang.String |
XML_ARTIFICIAL |
static java.lang.String |
XML_CATEGORY |
static java.lang.String |
XML_COMPOUND |
static java.lang.String |
XML_COMPOUND_HEAD |
static java.lang.String |
XML_COMPOUND_MODIFIER |
static java.lang.String |
XML_CON_REL |
static java.lang.String |
XML_EWN_RELATION |
static java.lang.String |
XML_EXAMPLE |
static java.lang.String |
XML_EXFRAME |
static java.lang.String |
XML_FRAME |
static java.lang.String |
XML_ID |
static java.lang.String |
XML_ILI_RECORD |
static java.lang.String |
XML_LEX_REL |
static java.lang.String |
XML_LEX_UNIT |
static java.lang.String |
XML_LEX_UNIT_ID |
static java.lang.String |
XML_NAMED_ENTITY |
static java.lang.String |
XML_OLD_ORTH_FORM |
static java.lang.String |
XML_OLD_ORTH_VAR |
static java.lang.String |
XML_ORTH_FORM |
static java.lang.String |
XML_ORTH_VAR |
static java.lang.String |
XML_PARAPHRASE |
static java.lang.String |
XML_PROPERTY |
static java.lang.String |
XML_PWN_WORD |
static java.lang.String |
XML_PWN20_ID |
static java.lang.String |
XML_PWN20_PARAPHRASE |
static java.lang.String |
XML_PWN20_SENSE |
static java.lang.String |
XML_PWN20_SYNONYM |
static java.lang.String |
XML_PWN20_SYNONYMS |
static java.lang.String |
XML_PWN30_ID |
static java.lang.String |
XML_RELATION |
static java.lang.String |
XML_RELATION_DIR |
static java.lang.String |
XML_RELATION_FROM |
static java.lang.String |
XML_RELATION_INV |
static java.lang.String |
XML_RELATION_NAME |
static java.lang.String |
XML_RELATION_TO |
static java.lang.String |
XML_RELATIONS |
static java.lang.String |
XML_SENSE |
static java.lang.String |
XML_SOURCE |
static java.lang.String |
XML_STYLE_MARKING |
static java.lang.String |
XML_SYNSET |
static java.lang.String |
XML_SYNSETS |
static java.lang.String |
XML_TEXT |
static java.lang.String |
XML_WIKTIONARY_EDITED |
static java.lang.String |
XML_WIKTIONARY_ID |
static java.lang.String |
XML_WIKTIONARY_PARAPHRASE |
static java.lang.String |
XML_WIKTIONARY_POS |
static java.lang.String |
XML_WIKTIONARY_SENSE |
static java.lang.String |
XML_WIKTIONARY_SENSE_ID |
static java.lang.String |
XML_WORD_CATEGORY |
static java.lang.String |
XML_WORD_CLASS |
static java.lang.String |
YES |
Constructor and Description |
---|
GermaNet(java.io.File dir)
Constructs a new
GermaNet object by loading the the data
files in the specified directory/archive File - searches are case sensitive. |
GermaNet(java.io.File dir,
boolean ignoreCase)
Constructs a new
GermaNet object by loading the the data
files in the specified directory/archive File. |
GermaNet(java.lang.String dirName)
Constructs a new
GermaNet object by loading the the data
files in the specified directory/archive path name - searches are case sensitive. |
GermaNet(java.lang.String dirName,
boolean ignoreCase)
Constructs a new
GermaNet object by loading the the data
files in the specified directory/archive path name. |
Modifier and Type | Method and Description |
---|---|
protected void |
addIliRecord(IliRecord ili)
Adds
IliRecords to this GermaNet
object when IliLoader is called |
protected void |
addSynset(Synset synset)
Adds a
Synset to this GermaNet 's
Synset list. |
protected void |
addWiktionaryParaphrase(WiktionaryParaphrase wiki)
Adds
WiktionaryParaphrases to this GermaNet
object when WiktionaryLoader is called |
java.lang.String |
getDir()
Gets the absolute path name of the directory where the GermaNet data files
are stored.
|
java.util.List<IliRecord> |
getIliRecords()
Returns a
List of all IliRecords . |
LexUnit |
getLexUnitByID(int id)
Returns the
LexUnit with id , or
null if it is not found. |
java.util.List<LexUnit> |
getLexUnits()
Returns a
List of all LexUnits . |
java.util.List<LexUnit> |
getLexUnits(java.lang.String orthForm)
Returns a
List of all LexUnits in which
orthForm occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant. |
java.util.List<LexUnit> |
getLexUnits(java.lang.String orthForm,
boolean considerMainOrthFormOnly)
Returns a
List of all LexUnits in which
orthForm occurs as main orthographical form -- in case
considerAllOrthForms is true. |
java.util.List<LexUnit> |
getLexUnits(java.lang.String orthForm,
WordCategory wordCategory)
Returns a
List of all LexUnits with the
specified WordCategory in which orthForm
occurs as as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant. |
java.util.List<LexUnit> |
getLexUnits(java.lang.String orthForm,
WordCategory wordCategory,
boolean considerMainOrthFormOnly)
Returns a
List of all LexUnits with the
specified WordCategory in which orthForm occurs
as main orthographical form -- in case considerAllOrthForms
is true. |
java.util.List<LexUnit> |
getLexUnits(WordCategory wordCategory)
Returns a
List of all LexUnits in the specified
wordCategory . |
java.util.HashMap<LexUnit,CompoundInfo> |
getLexUnitsWithCompoundInfo() |
Synset |
getSynsetByID(int id)
Returns the
Synset with id , or
null if it is not found. |
java.util.List<Synset> |
getSynsets()
Returns a
List of all Synsets . |
java.util.List<Synset> |
getSynsets(java.lang.String orthForm)
Returns a
List of all Synsets in which
orthForm occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant in one of its LexUnits , using the
ignoreCase flag as set in the constructor. |
java.util.List<Synset> |
getSynsets(java.lang.String orthForm,
boolean considerMainOrthFormOnly)
Returns a
List of all Synsets in which
orthForm occurs as main orthographical form in one of its
LexUnits -- in case considerAllOrthForms is
true. |
java.util.List<Synset> |
getSynsets(java.lang.String orthForm,
WordCategory wordCategory)
Returns a
List of all Synsets with the
specified WordCategory in which orthForm occurs
as main orthographical form, as orthographical variant, as old
orthographical form, or as old orthographic variant in one of its
LexUnits . |
java.util.List<Synset> |
getSynsets(java.lang.String orthForm,
WordCategory wordCategory,
boolean considerMainOrthFormOnly)
Returns a
List of all Synsets with the
specified WordCategory in which orthForm occurs
as main orthographical form in one of its LexUnits -- in
case considerAllOrthForms is true. |
java.util.List<Synset> |
getSynsets(WordCategory wordCategory)
Returns a
List of all Synsets in the specified
wordCategory . |
java.util.List<Synset> |
getSynsets(WordClass wordClass)
Returns a
List of all Synsets in the specified
wordClass . |
java.util.List<WiktionaryParaphrase> |
getWiktionaryParaphrases()
Returns a
List of all WiktionaryParaphrases . |
protected static boolean |
isZipFile(java.io.File file)
Checks whether the
File is a ZipFile . |
int |
numLexUnits()
Returns the number of
LexUnits contained in
GermaNet . |
int |
numSynsets()
Returns the number of
Synsets contained in GermaNet . |
protected void |
trimAll()
Trims all
Lists (takes ~0.3 seconds and frees up 2mb). |
protected void |
updateLexUnitsWithIli()
Adds the information about corresponding
IliRecords
to LexUnits |
protected void |
updateLexUnitsWithWiktionary()
Adds the information about corresponding
WiktionaryParaphrases
to LexUnits |
public static final java.lang.String XML_SYNSETS
public static final java.lang.String XML_SYNSET
public static final java.lang.String XML_ID
public static final java.lang.String XML_PARAPHRASE
public static final java.lang.String XML_WORD_CATEGORY
public static final java.lang.String XML_WORD_CLASS
public static final java.lang.String XML_LEX_UNIT
public static final java.lang.String XML_ORTH_FORM
public static final java.lang.String XML_ORTH_VAR
public static final java.lang.String XML_OLD_ORTH_FORM
public static final java.lang.String XML_OLD_ORTH_VAR
public static final java.lang.String XML_SOURCE
public static final java.lang.String XML_SENSE
public static final java.lang.String XML_STYLE_MARKING
public static final java.lang.String XML_NAMED_ENTITY
public static final java.lang.String XML_ARTIFICIAL
public static final java.lang.String XML_EXAMPLE
public static final java.lang.String XML_TEXT
public static final java.lang.String XML_EXFRAME
public static final java.lang.String XML_FRAME
public static final java.lang.String XML_RELATIONS
public static final java.lang.String XML_RELATION
public static final java.lang.String XML_CON_REL
public static final java.lang.String XML_LEX_REL
public static final java.lang.String XML_RELATION_NAME
public static final java.lang.String XML_RELATION_DIR
public static final java.lang.String XML_RELATION_INV
public static final java.lang.String XML_RELATION_TO
public static final java.lang.String XML_RELATION_FROM
public static final java.lang.String XML_ILI_RECORD
public static final java.lang.String XML_LEX_UNIT_ID
public static final java.lang.String XML_EWN_RELATION
public static final java.lang.String XML_PWN_WORD
public static final java.lang.String XML_PWN20_SENSE
public static final java.lang.String XML_PWN20_ID
public static final java.lang.String XML_PWN30_ID
public static final java.lang.String XML_PWN20_PARAPHRASE
public static final java.lang.String XML_PWN20_SYNONYMS
public static final java.lang.String XML_PWN20_SYNONYM
public static final java.lang.String YES
public static final java.lang.String NO
public static final java.lang.String XML_WIKTIONARY_PARAPHRASE
public static final java.lang.String XML_WIKTIONARY_ID
public static final java.lang.String XML_WIKTIONARY_SENSE_ID
public static final java.lang.String XML_WIKTIONARY_SENSE
public static final java.lang.String XML_WIKTIONARY_EDITED
public static final java.lang.String XML_WIKTIONARY_POS
public static final java.lang.String XML_COMPOUND
public static final java.lang.String XML_PROPERTY
public static final java.lang.String XML_CATEGORY
public static final java.lang.String XML_COMPOUND_MODIFIER
public static final java.lang.String XML_COMPOUND_HEAD
public GermaNet(java.lang.String dirName) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, java.io.IOException
GermaNet
object by loading the the data
files in the specified directory/archive path name - searches are case sensitive.dirName
- the directory where the GermaNet data files are locatedjava.io.FileNotFoundException
javax.xml.stream.XMLStreamException
java.io.IOException
public GermaNet(java.lang.String dirName, boolean ignoreCase) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, java.io.IOException
GermaNet
object by loading the the data
files in the specified directory/archive path name.dirName
- the directory where the GermaNet data files are locatedignoreCase
- if true ignore case on lookups, otherwise do case
sensitive searchesjava.io.FileNotFoundException
javax.xml.stream.XMLStreamException
java.io.IOException
public GermaNet(java.io.File dir) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, java.io.IOException
GermaNet
object by loading the the data
files in the specified directory/archive File - searches are case sensitive.dir
- location of the GermaNet data filesjava.io.FileNotFoundException
javax.xml.stream.XMLStreamException
java.io.IOException
public GermaNet(java.io.File dir, boolean ignoreCase) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, java.io.IOException
GermaNet
object by loading the the data
files in the specified directory/archive File.dir
- location of the GermaNet data filesignoreCase
- if true ignore case on lookups, otherwise do case
sensitive searchesjava.io.FileNotFoundException
javax.xml.stream.XMLStreamException
java.io.IOException
public java.lang.String getDir()
protected void addSynset(Synset synset)
Synset
to this GermaNet
's
Synset
list.synset
- the Synset
to addpublic java.util.List<Synset> getSynsets()
List
of all Synsets
.list
of all Synsets
public java.util.List<Synset> getSynsets(java.lang.String orthForm)
List
of all Synsets
in which
orthForm
occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant in one of its LexUnits
, using the
ignoreCase
flag as set in the constructor. Same than calling
getSynsets(orthForm, false)
with
considerMainOrthFormOnly=false
.orthForm
- the orthForm
to search forList
of all Synsets
containing
orthForm. If no Synsets
were found, this is a
List
containing no Synsets
public java.util.List<Synset> getSynsets(java.lang.String orthForm, boolean considerMainOrthFormOnly)
List
of all Synsets
in which
orthForm
occurs as main orthographical form in one of its
LexUnits
-- in case considerAllOrthForms
is
true. Else returns a List
of all Synsets
in
which orthForm
occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant in one of its LexUnits
-- in case
considerAllOrthForms
is false. It uses the
ignoreCase
flag as set in the constructor.orthForm
- the orthForm
to search forconsiderMainOrthFormOnly
- considering main orthographical form only
(true
) or all variants (false
)List
of all Synsets
containing
orthForm. If no Synsets
were found, this is a
List
containing no Synsets
public java.util.List<Synset> getSynsets(java.lang.String orthForm, WordCategory wordCategory)
List
of all Synsets
with the
specified WordCategory
in which orthForm
occurs
as main orthographical form, as orthographical variant, as old
orthographical form, or as old orthographic variant in one of its
LexUnits
. It uses the ignoreCase
flag as set in
the constructor. Same than calling
getSynsets(orthForm, wordCategory, false)
with
considerMainOrthFormOnly=false
.orthForm
- the orthForm
to be foundwordCategory
- the WordCategory
of the
Synsets
to be found (e.g. WordCategory.adj
)List
of Synsets
with the specified
orthForm
and wordCategory
.public java.util.List<Synset> getSynsets(java.lang.String orthForm, WordCategory wordCategory, boolean considerMainOrthFormOnly)
List
of all Synsets
with the
specified WordCategory
in which orthForm
occurs
as main orthographical form in one of its LexUnits
-- in
case considerAllOrthForms
is true. Else returns a
List
of all Synsets
in which
orthForm
occurs as as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant in one of its LexUnits
-- in case
considerAllOrthForms
is false. It uses the
ignoreCase
flag as set in the constructor.orthForm
- the orthForm
to be foundwordCategory
- the WordCategory
of the
Synsets
to be found (e.g. WordCategory.adj
)considerMainOrthFormOnly
- considering main orthographical form only
(true
) or all variants (false
)List
of Synsets
with the specified
orthForm
and wordCategory
.public java.util.List<Synset> getSynsets(WordCategory wordCategory)
List
of all Synsets
in the specified
wordCategory
.wordCategory
- the WordCategory
, for example
WordCategory.nomen
List
of all Synsets
in the specified
wordCategory
. If no Synsets
were found, this is
a List
containing no Synsets
.public java.util.List<Synset> getSynsets(WordClass wordClass)
List
of all Synsets
in the specified
wordClass
.wordClass
- the WordClass
, for example
WordCategory.Menge
List
of all Synsets
in the specified
wordClass
. If no Synsets
were found, this is
a List
containing no Synsets
.public Synset getSynsetByID(int id)
Synset
with id
, or
null
if it is not found.id
- the ID of the Synset
to be found.Synset
with id
, or null
if it is not found..public LexUnit getLexUnitByID(int id)
LexUnit
with id
, or
null
if it is not found.id
- the ID of the LexUnit
to be foundLexUnit
with id
, or
null
if it is not found.public int numSynsets()
Synsets
contained in GermaNet
.Synsets
contained in GermaNet
public int numLexUnits()
LexUnits
contained in
GermaNet
.LexUnits
contained in
GermaNet
public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm)
List
of all LexUnits
in which
orthForm
occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant. It uses the
ignoreCase
flag as set in the constructor. Same than
calling getSynsets(orthForm, false)
with
considerMainOrthFormOnly=false
.orthForm
- the orthForm
to search forList
of all LexUnits
containing
orthForm
. If no LexUnits
were found, this is a
List
containing no LexUnits
.public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm, boolean considerMainOrthFormOnly)
List
of all LexUnits
in which
orthForm
occurs as main orthographical form -- in case
considerAllOrthForms
is true. Else returns a
List
of all LexUnits
in which
orthForm
occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant -- in case
considerAllOrthForms
is false. It uses the
ignoreCase
flag as set in the constructor.orthForm
- the orthForm
to search forconsiderMainOrthFormOnly
- considering main orthographical form only
(true
) or all variants (false
)List
of all LexUnits
containing
orthForm
. If no LexUnits
were found, this is a
List
containing no LexUnits
.public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm, WordCategory wordCategory)
List
of all LexUnits
with the
specified WordCategory
in which orthForm
occurs as as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant. It uses the ignoreCase
flag as set in
the constructor. Same than calling
getSynsets(orthForm, wordCategory, false)
with
considerMainOrthFormOnly=false
.orthForm
- the orthForm
to be foundwordCategory
- the WordCategory
of the
LexUnits
to be found (eg WordCategory.nomen
)List
of LexUnits
with the specified
orthForm
and wordCategory
.public java.util.List<LexUnit> getLexUnits(java.lang.String orthForm, WordCategory wordCategory, boolean considerMainOrthFormOnly)
List
of all LexUnits
with the
specified WordCategory
in which orthForm
occurs
as main orthographical form -- in case considerAllOrthForms
is true. Else returns a List
of all LexUnits
in
which orthForm
occurs as main orthographical form, as
orthographical variant, as old orthographical form, or as old
orthographic variant -- in case
considerAllOrthForms
is false. It uses the
ignoreCase
flag as set in the constructor.orthForm
- the orthForm
to be foundwordCategory
- the WordCategory
of the
LexUnits
to be found (eg WordCategory.nomen
)considerMainOrthFormOnly
- considering main orthographical form only
(true
) or all variants (false
)List
of LexUnits
with the specified
orthForm
and wordCategory
.public java.util.List<LexUnit> getLexUnits(WordCategory wordCategory)
List
of all LexUnits
in the specified
wordCategory
.wordCategory
- the WordCategory
, (e.g.
WordCategory.verben
)List
of all LexUnits
in the specified
wordCategory
. If no LexUnits
were found, this
is a List
containing no LexUnits
.public java.util.List<LexUnit> getLexUnits()
List
of all LexUnits
.List
of all LexUnits
protected void trimAll()
Lists
(takes ~0.3 seconds and frees up 2mb).protected void addIliRecord(IliRecord ili)
IliRecords
to this GermaNet
object when IliLoader is calledili
- the IliRecord
to be addedpublic java.util.List<IliRecord> getIliRecords()
List
of all IliRecords
.List
of all IliRecords
protected void updateLexUnitsWithIli()
IliRecords
to LexUnits
protected void addWiktionaryParaphrase(WiktionaryParaphrase wiki)
WiktionaryParaphrases
to this GermaNet
object when WiktionaryLoader is calledwiki
- the WiktionaryParaphrase
to be addedpublic java.util.List<WiktionaryParaphrase> getWiktionaryParaphrases()
List
of all WiktionaryParaphrases
.List
of all WiktionaryParaphrases
public java.util.HashMap<LexUnit,CompoundInfo> getLexUnitsWithCompoundInfo()
protected void updateLexUnitsWithWiktionary()
WiktionaryParaphrases
to LexUnits
protected static boolean isZipFile(java.io.File file) throws java.io.IOException
File
is a ZipFile
.file
- the File
to checkFile
is a ZipFile
java.io.IOException