Prof. Detmar Meurers: ISCL Hauptseminar (Wintersemester 2008/2009)



Exploring the Automatic Analysis of Learner Language




Abstract: The availability of linguistically annotated corpora is supporting important empirical insights into how language works, and it has become essential for training and testing human language technology. The language produced by second language learners differs in significant ways from that produced by native speakers. A systematic, corpus-based analysis of the over- and underuse of specific constructions or the occurrence of errors that are typical for particular learner populations can help answer questions on how languages are acquired and support the development of human language technology that can analyze learner language, for example to provide feedback in Intelligent Computer-Aided Language Learning (ICALL) applications.

In this seminar we will discuss research on the development of annotation schemes for learner language, with a particular focus on what distinctions can be reliably annotated and how natural language processing tools can be adapted or created to automate such analysis for efficient corpus annotation and ICALL applications.


Instructor: Detmar Meurers


Course meets:


Online materials: We will be using the new department Moodle site for the course, which is accessible at http://courses.sfs.uni-tuebingen.de. You will access it to locate the updated syllabus, slides, pointers to reading material, and to post questions and participate in the discussion.

The first time you visit the department Moodle, you will need to create an account. To do so, select ``Create new account'' and enter your department user id as id, pick a new password, (do not use the same password for Moodle as for your department account) and enter your department email address (i.e. YOUR-ID@sfs.uni-tuebingen.de) as your email address. If you do not yet have a department account, please contact me asap.

For questions concerning the department accounts and computer system, you can contact the system administrator Jochen Saile. His office hours are: Thursdays 9-11 in room 2.25, Blochbau (Wilhelmstr. 19), email: saile@sfs.uni-tuebingen.de, phone: 29-78487.

Relatedly, we will at times send you email related to our class. Please be sure to read email sent to your department account at least once a day. You can ask Jochen Saile to forward your department email to another account that you read regularly.


Nature of course and my expectations: This is a research-oriented seminar, i.e., each participant is expected to take an active role in exploring the topic. More concretely, each participant is expected to

  1. regularly and actively participate in the class and lab discussion, read the papers assigned by me or the presenters and post a question on Moodle to the``Reading Discussion Forum'' on each reading at the latest the day before it is discussed in class. (30% of grade)

    Note: Following the rules of the Neuphilologische Fakultät, missing more than two meetings unexcused, automatically results in failing the class.

  2. explore and present a topic (30% of grade):

  3. in groups of two design, apply, and document a learner language annotation scheme and submit this as your ``Hausarbeit'' before the beginning of the next semester (40% of grade):

    If you are in the fifth semester of the BA in ISCL and you want to do an oral or a written exam, let me know before the Christmas break.


Academic conduct and misconduct: Research is driven by discussion and free exchange of ideas, motivations, and perspectives. So you are encouraged to work in groups, discuss, and exchange ideas. At the same time, the foundation of the free exchange of ideas is that everyone is open about where they obtained which information. Concretely, this means you are expected to always make explicit when you've worked on something as a team - and keep in mind that being part of a team always means sharing the work! For text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. Note that this includes text ``found'' on the web, where you should cite the url of the web site in case no more official publication is available.


Class etiquette: Please do not read or work on materials for other classes in class. When in the computer lab, only use the computers when you are asked to do a specific activity - do not read email or browse the web. Please come to class on time and do not pack up early. All portable electronic devices such as cell phones should be switched off for the entire length of the flight, oops, class. If for some reason, you must leave early or you have an important call coming in, or you have to miss class for an important reason, please let me know before class.


Topics:

Bibliography

Artstein, R. & M. Poesio (2009).

Survey Article: Inter-Coder Agreement for Computational Linguistics.
Computational Linguistics pp. 1-42.
URL http://www.mitpressjournals.org/doi/abs/10.1162/coli.07-034-R2.

Brants, T. & W. Skut (1998).

Automation of Treebank Annotation.
In Proceedings of New Methods in Language Processing. Sydney, Australia.

Dagneaux, E., S. Denness & S. Granger (1998).

Computer-aided error analysis.
System 26(2), 163-174.
URL http://www.sciencedirect.com/science/article/B6VCH-3TN9MNX-1/2/2e434546d3bbd466fad8adb01a42f66c.

Díaz-Negrillo, A. (2007).

A Fine-Grained Error Tagger for Learner Corpora.
Ph.D. thesis, University of Jaén, Spain.

Díaz-Negrillo, A. & J. Fernández-Domínguez (2006).

Error Tagging Systems for Learner Corpora.
Revista Española de Lingüística Aplicada (RESLA) 19, 83-102.
URL http://dialnet.unirioja.es/servlet/fichero_articulo?codigo=2198610&orden=72810.

Eeg-Olofsson, J. & O. Knutsson (2003).

Automatic Grammar Checking for Second Language Learners - the Use of Prepositions.
In Proceedings of Nodalida'03. Reykjavik, Iceland.
URL http://www.nada.kth.se/~knutsson/eegolofsson_knutsson.pdf.

Ellis, R. (1994).

The Study of Second Language Acquisition.
Oxford: Oxford University Press.

ETS (2008).

Annotating Preposition Errors. Annotation Manual Version 2.0-1.
Internal Annotation Manual used at the Educational Testing Service (ETC).

Gamon, M., J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko & L. Vanderwende (2008).

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction.
In Proceedings of IJCNLP. Hyderabad, India.
URL http://www.mt-archive.info/IJCNLP-2008-Gamon.pdf.

Garnier, S., Y. Tall, S. Fissaha & J. Haller (2003).

Learner Corpora: Design, Development and Applications. Development of NLP tools for CALL based on learner corpora (German as a foreign language).
In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference (CL 2003). Technical Papers 16. Lancaster University: University Centre for Computer Corpus Research on Language. pp. 246-252.
URL http://ucrel.lancs.ac.uk/publications/CL2003/papers/garnier.pdf.

Gass, S. M. & L. Selinker (2001).

Second Language Acquisition: An Introductory Course.
Mahwah, NJ: Lawrence Erlbaum Associates, second edition ed.

Granger, S. (1998).

Chapter 1. The computerized learner corpus: a versatile new source of data for SLA research.
In S. Granger (ed.), Learner English on computer, London; New York: Longman.

Granger, S. (2003).

Error-tagged learner corpora and CALL: A promising synergy.
CALICO Journal 20(3), 465-480.

Granger, S. (2004).

Computer learner corpus research: current status and future prospects.
In C. U. & U. T. (eds.), Applied Corpus Linguistics: A Multidimensional Perspective, Amsterdam & Atlanta: Rodopi, pp. 123-145.
URL http://cecl.fltr.ucl.ac.be/Downloads/Indianapolis&

Han, N.-R., M. Chodorow & C. Leacock (2006).

Detecting Errors in English Article Usage by Non-Native Speakers.
Natural Language Engineering 12(2), 115-129.

Izumi, E. (2006).

The NICT Japanese Learner Corpus (JLE) Corpus Project.
Tech. rep., National Institution of Information and Communications Technology (NICT).
URL http://www2.nict.go.jp/x/x161/en/member/izumi_emi/project.html.

Izumi, E., K. Uchimoto & H. Isahara (2004).

SST speech corpus of Japanese learners' English and automatic detection of learners' errors.
ICAME Journal 28, 31-48.
URL http://icame.uib.no/ij28/Izumi.pdf.

Izumi, E., K. Uchimoto & H. Isahara (2005).

Error Annotation for Corpus of Japanese Learner English.
In Proceedings of the Second International Joint Conference on Natural Language Processing.

Izumi, E., K. Uchimoto, T. Saiga, T. Supnithi & H. Isahara (2003).

Automatic Error Detection in the Japanese Learners' English Spoken Data.
In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan: Association for Computational Linguistics, pp. 145-148.
URL http://www.aclweb.org/anthology/P03-2024.

Lee, J. & S. Seneff (2006).

Automatic Grammar Correction for Second-Language Learners.
In INTERSPEECH 2006 - ICSLP.
URL http://groups.csail.mit.edu/sls/publications/2006/IS061299.pdf.

Leech, G. (2004).

Chapter 2. Adding Linguistic Annotation.
In M. Wynne (ed.), Developing Linguistic Corpora: a Guide to Good Practice, Oxford: Oxbow Books.
URL http://ahds.ac.uk/creating/guides/linguistic-corpora/chapter2.htm.

Milton, J. C. P. & N. Chowdhury (1994).

Tagging the interlanguage of Chinese learners of English.
In Proceedings joint seminar on corpus linguistics and lexicology, Guangzhou and Hong Kong, 19-22 June, 1993, Language Centre, HKUST, Hong Kong. pp. 127-143.
URL http://hdl.handle.net/1783.1/1087.

Nagata, R., A. Kawai, K. Morihiro & N. Isu (2006).

A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English.
In Proceedings of ACL-COLING-06. Sydney, Australia, pp. 241-248.
URL http://www.aclweb.org/anthology/P06-1031.

Nesselhauf, N. (2004).

Learner corpora: Learner corpora and their potential for language teaching.
In J. M. Sinclair (ed.), How to Use Corpora in Language Teaching, John Benjamins, pp. 125-152.

Nicholls, D. (2003).

The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT.
In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference (CL 2003). Lancaster University: University Centre for Computer Corpus Research on Language., Technical Papers 16, pp. 572-581.
URL http://ucrel.lancs.ac.uk/publications/CL2003/papers/nicholls.pdf.

Passonneau, R. J. (1997).

Applying Reliability Metrics to Co-Reference Annotation.
CoRR URL http://arxiv.org/abs/cmp-lg/9706011.

Tetreault, J. & M. Chodorow (2008).

The Ups and Downs of Preposition Error Detection in ESL Writing.
In Proceedings of COLING-08. Manchester.
URL http://www.ets.org/Media/Research/pdf/r3.pdf.

Tono, Y. (2000).

A corpus-based analysis of interlanguage development: analysing POS tag sequences of EFL learner corpora.
In PALC'99: Practical Applications in Language Corpora. pp. 323-340.
URL http://leo.meikai.ac.jp/~tono/paper/palc99.pdf.

Tono, Y. (2003).

Learner corpora: design, development and applications.
In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference (CL 2003). Technical Papers 16. Lancaster University: University Centre for Computer Corpus Research on Language. pp. 800-809.
URL http://ucrel.lancs.ac.uk/publications/CL2003/papers/tono.pdf.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 -no_navigation -html_version 4.0,latin1,unicode syllabus

The translation was initiated by Detmar Meurers on 2009-01-22


Detmar Meurers 2009-01-22