Grammarformalisms

The XTAG-System (1)

11. Juni 2007

Laura Kallmeyer, Timm Lichte, Wolfgang Maier

Setting up the system on your account

The XTAG tools are installed in the AFS. To get them to work on your account, you need to set the relevant paths:


The components of the XTAG System

XTAG

The Morphological database

Morphological Analyzer and Morph Database: Consists of appr. 317000 inflected items derived from over 90000 stems. Returns root form, POS, and inflectional information.

We will take a look at the maintenance interface of the morphological database. It allows to add, edit or view entries of the morphological database. Please type

$ cp /afs/sfs/lehre/wmaier/xtag/tools/morph-1.5/data/morph_english.db ~/
$ cd ~/
$ xmdbm -db morph_english.db
and lookup some interesting words:

You might want to delete the morphology database after using it:

$ rm ~/morph_english.db
to top

POS Tagger/POS Blender

The basis for tree selection of the parser is the presence of a POS tag on every word of the input. For those lexical items that cannot be assigned a POS tag via the Morphological Analyzer, the POS tagger is used. The POS Blender makes the final decision about the POS tag a word receives. It uses the output of the POS tagger as a filter on the output of the morphological analyzer. Any words that are not found in the morphological database are assigned the POS given by the tagger.

to top

Trees DB and Syn DB

Tree Database: 1004 trees, divided into 53 tree families and 221 individual trees. The tree families represent subcategorization frames; the trees in a tree-family would be related to each other transformationally in a movement-based approach.

Syntactic Database: More than 30000 entries. Each entry consists of: uninflected form of the word, POS, list of trees or tree-families associated with the word, and a list of feature equations that capture lexical idiosyncrasies.

We will take a look at the maintenance interface of the syntatic database. Please run

$ synedit
and load the following file:
/afs/sfs/lehre/wmaier/tag-seminar/xtag/syntax.flat
The interface can now be used to search the database or to view or edit entries.

Various tools for the access to the two databases are available.

  1. $ xtag.show english [regexp]

    xtag.show can be used to view individual trees from the grammar. Any regular expression can be used to match the tree names, e.g.

    xtag.show english ^betaN[0-9]*
    will display all the relative clause trees.

  2. $ xtag.show.fam english [regexp1] [regexp2]
    xtag.show.fam can be used to view trees in families from the grammar. Any tree family that matches [regexp1] and then each tree in that family which matches [regexp2] is displayed. e.g.
    $ xtag.show.fam english ^Tnx0Vnx1$ ^alphaW[0-1]*
    will display all wh-extraction trees from the transitive tree family
  3. $ xtag.show.word english [word] [regexp]
    xtag.show.word can be used to view all trees lexicalized by [word]. In addition, the list of all such trees can be filtered by using the optional [regexp] parameter. Use ".*" if you want to see all selected trees. e.g.
    $ xtag.show.word english aim "for"
    will show only the trees that are anchored by the word "aim" and coanchored by the preposition "for". To see all trees for "aim", run the command
    $ xtag.show.word english aim ".*"
    . To see a transitive and an intransitive version of an elementary tree for bought, run
    $ xtag.show.word english bought "alphanx0Vnx1\[bought\]"
    $ xtag.show.word english bought "alphanx0Vnx2nx1\[bought\]"
    
Using the tree display window:
right mouse button: next tree
left mouse button: previous tree
key 'f': show features
key 'q': exit
to top

The parser

The parser works in two steps. During tree selection, for each word, tree templates are chosen from the Tree database and the anchor position is filled with the word. POS tagging is decisive during this step: the set of POS tags corresponds exactly to the set of anchor nodes. This allows to establish a correspondence between words and elementary trees. During tree grafting, parsing is done. The output is a parse forest from which derived trees and derivation trees can be extracted.
to top

Wolfgang Maier
Last modified: Tue Jun 5 14:45:38 CEST 2007