ISCL Hauptseminar (Summer semester 2023)

Analyzing Language Development


Going beyond tests designed to assess language abilities, how can we characterize the language proficiency of a first or second language learner and their development? What can the computational linguistic analysis of their language production reveal? In addition to fostering our understanding of language development and the practical goal of ecologically valid proficiency assessment, such analyses are also of immediate relevance for any approach designed to adaptively foster learning.

In this seminar we’ll consider a range of approaches originating in different fields and using different methods. On the one hand, there is research on first language acquisition computing quantitative metrics such as the Mean Length of Utterances. Other approaches such as (Revised) Developmental Level (D-Level, Lu 2009Voss 2005) or Developmental Scoring (DSS), or the Index of Productive Syntax (IPSYN, Sagae et al. 2005Lubetich & Sagae 2014) identify the use and frequency of particular linguistic structures. On the other hand, second language acquisition (SLA) is systematically characterized in terms of Complexity, Accuracy and Fluency - with a broad range of complexity measures at all levels of linguistic modeling being identifiable by computational linguistic methods. Other SLA approaches define specific developmental sequences and rely on those to support the interpretation of relatively few observations about spontaneous language production in terms of a “Rapid Profile” of proficiency. In a related but more descriptive approach, the English Grammar Profile approach identifies a broad range of criterial features capturing the emergence of language forms and usage at different levels of proficiency. Depending on the interest of participants, we will also consider characteristics of the spoken language in addition to the analyses based on written language.

Instructor: Detmar Meurers

Course meets: 4 SWS

Credit Points:

Syllabus (this file):

Moodle page:

Please enroll in this course by logging into this Moodle course.

Nature of course and our expectations: This is a research-oriented Hauptseminar, in which we jointly explore the topic. Everyone is expected to

  1. regularly and actively participate in class, read the assigned papers and post a meaningful question on Moodle to the “Discussion Forum” on each reading at the latest on the day before the topic is discussed in class.

  2. explore and present a topic (individually or as part of a group)

  3. if you pursue the 9 CP option, work out a project term paper

Academic conduct and misconduct: Research is driven by discussion and free exchange of ideas, motivations, and perspectives. So you are encouraged to work in groups, discuss, and exchange ideas. At the same time, the foundation of the free exchange of ideas is that everyone is open about where they obtained which information. Concretely, this means you are expected to always make explicit when you’ve worked on something as a team – and keep in mind that being part of a team always means sharing the work.

For text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. Note that this includes text “found” on the web, where you should cite the url of the web site in case no more official publication is available.

Class etiquette: Please do not read or work on materials for other classes in our seminar. All portable electronic devices such as cell phones and laptops should be switched off for the entire length of the flight, oops, class.


Topics (first sketch: this will develop as the semester proceeds)


   Ai, H. & X. Lu (2013). A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds.), Automatic Treatment and Analysis of Learner Corpus Data, John Benjamins, pp. 249–264.

   Alexopoulou, T., M. Michel, A. Murakami & D. Meurers (2017). Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques. Language Learning 67, 181–209. URL

   Alshahrani, A. (2008). RAPID PROFILE AS AN ALTERNATIVE ESL PLACEMENT TEST. Annual Review of Education, Communication & Language Sciences 5.

   Arhiliuc, C., J. Mitrović & M. Granitzer (2020). Language proficiency scoring. In Proceedings of The 12th Language Resources and Evaluation Conference. pp. 5624–5630.

   Brezina, V. & G. Pallotti (2019). Morphological complexity in written L2 texts. Second language research 35(1), 99–119.

   Carroll, P. E. & A. L. Bailey (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing 33(1), 23–52.

   Chen, X. & D. Meurers (2019). Linking text readability and learner proficiency using linguistic complexity feature vector distance. Computer-Assisted Language Learning 32(4), 418–447.

   Chen, X., D. Meurers & P. Rebuschat (2022). ICALL offering individually adaptive input: Effects of complex input on L2 development. Language Learning & Technology 26(1). URL

   Covington, M. A., C. He, C. Brown, L. Naçi & J. Brown (2006). How complex is that sentence? A proposed revision of the Rosenberg and Abbeduto D-Level Scale. Computer Analysis of Speech for Psychological Research (CASPR) Research Report 2006-01, The University of Georgia, Artificial Intelligence Center, Athens, GA. URL

   Crossley, S. A. & D. S. McNamara (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing 26, 66–79.

   Díaz Negrillo, A. (2007). A Fine-Grained Error Tagger for Learner Corpora. Ph.D. thesis, University of Jaén, Spain.

   Ehl, B., M. Paul, G. Bruns, E. Fleischhauer, M. Vock, A. Gronostaj & M. Grosche (2018). Testgütekriterien der “Profilanalyse nach Grießhaber”. Evaluation eines Verfahrens zur Erfassung grammatischer Fähigkeiten von ein-und mehrsprachigen Grundschulkindern. Zeitschrift für Erziehungswissenschaft 21(6), 1261–1281.

   Farhady, H. (1982). Measures of language proficiency from the learner’s perspective. TESOL quarterly 16(1), 43–59.

   Glaboniat, M., M. Müller, P. Rusch, H. Schmitz & L. Wertenschlag (2002). Profile deutsch, vol. 21. Langenscheidt Berlin.

   Gotsoulia, V. & B. Dendrinos (2011). Towards a corpus-based approach to modelling language production of foreign language learners in communicative contexts. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011. pp. 557–561.

   Grießhaber, W. (2013). Die Profilanalyse für Deutsch als Diagnoseinstrument zur Sprachförderung. Überblick. Kompetenzzentrum ProDaZ (Online: https://www. unidue. de/imperia/md/content/prodaz/griesshaber_profilanalyse_deutsch. pdf, Zugriff: 06.03. 2017) .

   Grießhaber, W. (2019). 22. Profilanalysen. In Sprachdiagnostik Deutsch als Zweitsprache, De Gruyter Mouton, pp. 547–568.

   Harrison, J. (2015). The English grammar profile. English Profile in Practice, English Profile Studies 5, 28–48.

   Hartmann, S., N. Koch & A. E. Quick (2021). The traceback method in child language acquisition research: identifying patterns in early speech. Language and Cognition 13(2), 227–253.

   Hawkins, J. & L. Filipovic (2012). Criterial Features in L2 English. Cambridge: Cambridge University Press.

   Hawkins, J. A. & P. Buttery (2010). Criterial Features in Learner Corpora: Theory and Illustrations. English Profile Journal .

   Horbach, A., J. Poitz & A. Palmer (2015). Using shallow syntactic features to measure influences of L1 and proficiency level in EFL writings. In Proceedings of the fourth workshop on NLP for computer-assisted language learning. pp. 21–34.

   Housen, A. & F. Kuiken (2009). Complexity, Accuracy and Fluency in Second Language Acquisition. Applied Linguistics 30(4), 461–473.

   Kerz, E., Y. Qiao, D. Wiechmann & M. Ströbel (2020). Becoming linguistically mature: Modeling english and german children’s writing development across school grades. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. pp. 65–74.

   Kerz, E., D. Wiechmann, Y. Qiao, E. Tseng & M. Ströbel (2021). Automated classification of written proficiency levels on the CEFR-scale through complexity contours and RNNs. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications. pp. 199–209.

   Keßler, J.-U. (2007). Assessing EFL-development online: A feasibility study of Rapid Profile. Second language acquisition research: Theory-construction and testing pp. 119–144.

   Kol, S., B. Nir & S. Wintner (2014). Computational evaluation of the Traceback Method. Journal of Child Language 41(1), 176–199.

   Kyle, K. (2016). Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication. Ph.D. thesis, Georgia State University. URL

   Kyle, K. & S. A. Crossley (2015). Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application. TESOL Quarterly 49(4), 757–786.

   Laufer, B. & P. Nation (1995). Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics 16(3), 307–322. URL

   Lu, X. (2009). Automatic measurement of syntactic complexity in child language acquisition. International Journal of Corpus Linguistics 14(1), 3–28.

   Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4), 474–496.

   Lu, X. (2011). A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers’ Language Development. TESOL Quarterly 45(1), 36–62.

   Lu, X. (2012). The Relationship of Lexical Richness to the Quality of ESL Learners’ Oral Narratives. The Modern Languages Journal pp. 190–208.

   Lu, X. & H. Ai (2015). Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing 29, 16–27.

   Lubetich, S. & K. Sagae (2014). Data-driven Measurement of Child Language Development with Simple Syntactic Templates. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland: Dublin City University and Association for Computational Linguistics, pp. 2151–2160. URL

   Mackey, A., M. Pienemann & I. Thornton (1991). Rapid Profile: A second language screening procedure. Working Papers of the National Languages Institute of Australia 1(1), 61–82.

   Malvern, D. D., R. B. J., C. N. & D. P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan.

   McCarthy, P. M. & S. Jarvis (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods 42(2), 381–392.

   Michaud, L. N. & K. F. McCoy (1999). Modeling User Language Prociency in a Writing Tutor for Deaf Learners of English. In Proceedings of the Symposium on Computer-Mediated Language Assessment and Evaluation in Natural Language Processing, an ACL-IALL Workshop. University of Maryland, College Park, Maryland, pp. 47–54.

   O’Donnell, M. B. & U. Römer (2009). Proficiency development and the phraseology of learner language. Paper Presented at the 30th ICAME Conference 2009. Lancaster, UK. 27–31 May 2009.

   Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(4), 492–518.

   Paquot, M. (2019). The phraseological dimension in interlanguage complexity research. Second Language Research 35(1), 121–145.

   Pilán, I., E. Volodina & T. Zesch (2016). Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pp. 2101–2111.

   Powers, S., D. M. Johnson, H. B. Slaughter, C. Crowder & P. B. Jones (1985). Reliability and validity of the language proficiency measure. Educational and psychological measurement 45(4), 959–963.

   Read, J. & P. Nation (2004). Measurement of formulaic sequences. Formulaic sequences: Acquisition, processing and use pp. 23–35.

   Sagae, K., A. Lavie & B. MacWhinney (2005). Automatic measurement of syntactic development in child language. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL-05). Ann Arbor, MI.

   Skehan, P. (1989). Individual Differences in Second Language Learning. Edward Arnold.

   Sladoljev-Agejev, T. & J. Šnajder (2017). Using analytic scoring rubrics in the automatic assessment of college-level summary writing tasks in l2. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 181–186.

   Tono, Y. (2013). Criterial feature extraction using parallel learner corpora and machine learning. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds.), Automatic Treatment and Analysis of Learner Corpus Data, John Benjamins, pp. 169–204.

   Vajjala, S. & K. Lõo (2013). Role of Morpho-syntactic features in Estonian Proficiency Classification. In Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications (BEA8), Association for Computational Linguistics. URL

   Vajjala, S. & K. Lõo (2014). Automatic CEFR level prediction for Estonian learner text. In Proceedings of the third workshop on NLP for computer-assisted language learning. pp. 113–127.

   Volodina, E., I. Pilán, L. Llozhi, B. Degryse & T. François (2016). SweLLex: second language learners’ productive vocabulary. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition. pp. 76–84.

   Voss, M. J. (2005). Determining Syntactic Complexity Using Very Shallow Parsing. Research Report 2005-01, Computer Analysis of Speech for Psychological Research (CASPR), Institute for Artificial Intelligence, The University of Georgia. URL Published verison of MSc thesis.

   Vyatkina, N. (2012). The Development of Second Language Writing Complexity in Groups and Individuals: A Longitudinal Learner Corpus Study. The Modern Language Journal 96(4), 576–598. URL

   Vyatkina, N., H. Hirschmann & F. Golcher (2015). Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study. Journal of Second Language Writing .

   Weiss, Z. & D. Meurers (2019). Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary School. In Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications (BEA). Florence, Italy: Association for Computational Linguistics.

   Wisniewski, K. (2017). Empirical Learner Language and the Levels of the Common European Framework of Reference. Language Learning 67(S1), 232–253. URL

   Wolf, M. K., T. Farnsworth & J. Herman (2008). Validity issues in assessing English language learners’ language proficiency. Educational Assessment 13(2-3), 80–107.

   Wolfe-Quintero, K., S. Inagaki & H.-Y. Kim (1998). Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity. Honolulu: Second Language Teaching & Curriculum Center, University of Hawaii at Manoa. URL

   Yoon, S.-Y., S. Bhat & K. Zechner (2012). Vocabulary profile as a measure of vocabulary sophistication. In Proceedings of the seventh workshop on building educational applications using NLP. pp. 180–189.

   Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing 27(1), 119–140.

Example learner corpus: NOCE (Díaz Negrillo 2007)