The following six courses will make up the instructional part of the school:
Computational Tools for Corpus Linguistics
The availability of electronic text corpora has led to the development of various tools in computational linguistics for the construction, annotation and search of huge amounts of data.
The course will provide and overview of the most important aspects of processing text copora:
n the course we will as much as possible use tools and resources developed at the Special Research Program (Sonderforschungsbereich) 441. This will give he participants direct access to the relevant tools so that they may gain practical experience with development and use of these tools.
Lecturers: Erhard Hinrichs, Tübingen; Sandra Kübler, Tübingen.
Corpus-Based Investigation of Issues in Pragmatics
Initially electronic corpora were used for lexicographic purposes. In recent years they have been increasingly employed for the investigation of grammatical questions. Pragmatic issues, however, have usually been ignored, because of traditional preference for using oral texts, which are practically unavailable as electronic copora. This course will pursue the question of how well text copora can nevertheless be used for pragmatic studies if the methodology is adjusted accordingly. More precisely, the following topics will be addressed:
The issues sketched here will be treated with examples from deixis and the theory of speech acts (with emphasis on linguistic expression and politeness). The Slavic corpora of the Tübingen Special Research Program 441 will be used as materials.
Lecturer: Tilman Berger, Tübingen.
Head-Driven Phrase Structure Grammar for Slavic
The aim of this course is to introduce Head-driven Phrase Structure Grammar (HPSG), a constraint-based linguistic formalism. The empirical material used in the course will be drawn from Slavic languages, and the theoretical phenomena dealt with will include:
Lecturer: Adam Przepiórkowski, Warsaw.
XML-based Corpus Linguistics
This course will be based on the CLaRK system developed in the CLaRK Programme and actively used at the SfS and LML for the construction, management and exploration of annotated corpora of German and Bulgarian sentences. The course will cover the following topics:
The style in which corpora are encoded in the CLaRK system will be compared with other styles of encoding corpora, especially with the referential annotation developed in the GATE system. The CLaRK system will be made available to all participants.
Lecturer: Kiril Simov, Sofia.
Morphological and Syntactic Tagging of Slavonic Languages
The objective of the workshop is to discuss various issues concerning morphological, syntactic and other tagging of corpora of Slavic languages. This language family is characterized by specific morphological and syntactic features which can be studied given the existence of various corpora of these languages. The workshop will make it possible for the researchers specializing in the study of Slavic languages to mutually inform one another about the latest results in the tagging of Slavic corpora. One of the main topics will also be the methodology used for morphological tagging: the comparison of stochastic tagging of Slavic languages and the rule-based one and assessing specific differences in tagging different Slavic languages and other languages with whose tagging there is already plenty of experience available (English, German). Another key topic will be the treebanks of syntactically annotated corpora -- various approaches and methodologies used for syntactic annotation will be presented and compared. One of the main results of the workshop is to evaluate the current state of the art of Slavic languages corpora and their processing. Thus, the tagged and annotated corpora of Slavic languages compared to existing annotated corpora of Germanic and Romance languages could contribute to revealing some new typological differences between various language families which -- without corpora -- could not have been discovered yet.
Lecturers: Vladimír Petkevic, Prague; Karel Oliva, Saarbrücken.
Applications of Text Corpora to Lexicography
At the heart of the course will be a comparison of traditional and computer-based approaches to the construction of dictionaries, with special emphasis on corpus-based lexicography. The question of a suitable form of text corpora for lexicographic purposes will be discussed on an introductory level. This will give rise to the discussion of issues of corpus annotation, the representativity of corpora, economy, and an optimized structuring of the data.
In the second part of the course, practical issues of text corpora in specific applications will be discussed on the basis of concrete corpus-oriented projects. The following projects could be used as starting points:
Lecturer: Anatolij N. Baranov, Moscow.