Multi-level annotation for spoken-language corpora

Philippe BLACHE Daniel HIRST LPL - Université de Provence

June 16, 2000

The analysis of the interactions between different levels of linguistic analysis (prosody, syntax, semantics, pragmatics etc) requires the use of corpora. There are however very few existing corpora with this type of multi-level annotation. There are a number of serious obstacles to the development of such resources including the choice of what information to represent, the form that information should take and the way in which the information can be exploited once it exists. In this paper we propose an approach based on the use of annotation graphs adapted to the treatment of spoken language data.

What information to represent ?
Classical annotation systems are not well adapted to this type of corpora since they are not capable of taking into account several simultaneous levels of representation. In particular even when it is possible to introduce the notion of "point of view" it is not possible to represent the embedding of different levels. The fundamental difficulties lies in the fact that the basic building-blocks of prosody and syntax are not superposable : syllables and accent groups may well involve a different parsing than morphemes and words for example. It may also be useful to annotate information such as tonal representations of intonation patterns which is not necessarily linked to a specific phoneme or even syllable but which is more usefully treated as an autonomous level. The syntactic analysis of spontaneous speech phenomena such as repetitions, false starts, hesitations, corrections requires a non-hierarchical form of representation. Finally, in the area of both prosody and syntax (and no doubt elsewhere) it would be useful to be able to maintain simultaneous ambiguous representations rather than imposing a single interpretation.
What type of annotation ?
Annotation graphs augmented with a system of typing provide an interesting solution to these problems. In this approach, the acoustic signal (when available) provides the fundamental baseline for reference. A task-specific set of pointers to instants in the signal forms the set of knots with no specific limits or linguistic constraints). The different annotation labels are borne by different types of arcs, each type corresponding to a given level of linguistic analysis. The annotation of a sequence would thus be formed from the definition of a set of knots followed by a set of arcs which (unlike the knots) carry the linguistic information. The following examples illustrate a few possibilities.
<arc pros>                              % prosodic arc
    <begin id="node1">                  % origin node
        <label type="phon" name="i">    % phoneme /i/
    <end id="node2">                    % target node
</arc pros>


<arc pros>
    <begin id="node1">
        <label type="syl" name="its">      % arc " syllable " /its/
    <end id="node4">
</arc pros>


<arc synt>                                       % syntactic arc
    <begin id="node1">
       <label type="word" name="it" cat="pro">   % label (pronoun "it")
    <end id="node3">
</arc synt>
Using multi-level annotations.
This technique makes it possible to use a language like XML for the representation of several different levels of information applied to the same basic data. This makes it possible to formulate queries referring simultaneously to different levels of annotation. Rather than develop a specific query language for this task we propose the adaptation for annotation graphs of a generic query language, SgmlQL, adapted to the type of multi-level representation proposed here.

doug@essex.ac.uk