Computational approaches to the study of dialectal and typological variation
August 6-10 2012 (ESSLLI first week),
Opole, Poland

Monday, August 6
14.00 - 15:30 Erhard Hinrichs and Gerhard Jäger
Tuesday, August 7
14.00 - 15.30 Martijn Wieling
(Groningen/Tübingen, invited speaker)
A sociolinguistic analysis of linguistically sensitive dialectal word
pronunciation distances

Wednesday, August 8
14.00 - 14.45 Taraka Rama
"N-gram approaches to the historical dynamics of basic vocabulary"
14.45 - 15.30 Job Schepens
"Regressing morphological differences on an empirical measure of linguistic distance"
Thursday, August 9
14.00 - 14.45 Natalia Levshina
"'Let in translation': A typological study of the concept of LETTING in a parallel corpus of film subtitles"
14.45 - 15.30 Kapila Ponnamperuma, Chris Mellish and Peter Edwards
"Using Distributional Similarity for Identifying Vocabulary Differences between Individuals"
Friday, August 10
14:00 - 14:45
Johann-Mattis List
"Improving Phonetic Alignment by Handling Secondary Sequence Structures"
14.45 - 15.30 Erhard Hinrichs and Gerhard Jäger
Erhard Hinrichs
Gerhard Jäger

Workshop Purpose:

Computational dialectometry is an innovative method to investigate language variation. This still rather young approach employs techniques from statistical NLP - such as pattern recognition, sequence alignment, clustering, and dimension reduction techniques - to study synchronous dialectal variation. It uses easy-to-operationalize data (such as phonetic transcriptions of a small core vocabulary) collected from a large number of speakers within a certain geographic area. Methods from unsupervised machine learning are then used to measure dialect distances and to model dialect continua. Together with advances in digitally collecting population and geographic data, it is now possible to study the correlation of linguistic variation with social and geographic factors.

Recent years have seen remarkable efforts in typology to set up electronic data inventories that contain significant data sets from large, typologically diverse and representative samples of languages. The data types thus collected in computational typology are remarkably similar - from an operational point of view - to the kind of resources that are being used in computational dialectometry. It is therefore a natural move to bring these two communities into contact and to discuss the mutual usability of algorithms and perhaps common standards for data encoding and exchange.

The goals of this workshop are twofold: - to expose the ESSLLI community in general and researchers at the interface of language and computation in particular to the application of data-driven NLP methods to a rather new domain, and - to provide a forum for practitioners and students of computational dialectometry, of quantitative typology, and of historical linguistics to learn about each other's research concerns and accompanying methods, and to receive feedback as well as inspiration for possible collaboration across sub-disciplines.

