ISCL Hauptseminar (Winter semester 2019)

Computational Linguistic Analysis of Linguistic Complexity for Readability and Proficiency Assessment


Notions of complexity surface in a number of different contexts: In theoretical linguistics, syntactic structures are analyzed in terms of their complexity and constraints such as the complex-NP constraint are formulated on this basis. In cognitive psychology, the complexity involved in cognitively processing language input in human sentence processing is studied. In second language acquisition research, the analysis of complexity (together with accuracy and fluency) is used to gain insights into the process and product of acquisition. In language testing and learner corpus research, the linguistic complexity of learner language is related to proficiency levels. For readability research, the linguistic complexity is used to determine who a given text is readable for.

In this seminar, we will discuss the empirical and conceptual nature of these notions of complexity and explore where the formalization and automatic analysis offered by computational linguistics can lead to applications such as automatic readability measures, proficiency classification, and search engines supporting the filtering of results by complexity for particular target audiences.


Nature of course and our expectations: This is a research-oriented, hands-on Hauptseminar, in which we jointly explore the topic and gain practical experience in implementing analyses. Substantial programming experience (at least at the level of the second Data Structures and Algorithms course) is required; permission may be granted for teams of two people combining complementary expertise. Everyone is expected to

  1. successfully complete the regular exercises and small projects assigned during the semester and present the results to the seminar,
  2. regularly and actively participate in class, read the assigned papers and post a meaningful question on Moodle to the “Discussion Forum” on each reading at the latest on the day before the topic is discussed in class.
  3. explore and present a topic (individually or as part of a group)
  4. if you pursue the 9 CP option, work out a project term paper

Academic conduct and misconduct: Research is driven by discussion and free exchange of ideas, motivations, and perspectives. So you are encouraged to work in groups, discuss, and exchange ideas. At the same time, the foundation of the free exchange of ideas is that everyone is open about where they obtained which information. Concretely, this means you are expected to always make explicit when you’ve worked on something as a team – and keep in mind that being part of a team always means sharing the work.

For text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. Note that this includes text “found” on the web, where you should cite the url of the web site in case no more official publication is available.

Class etiquette: Please do not read or work on materials for other classes in our seminar. All portable electronic devices such as cell phones and laptops should be switched off for the entire length of the flight, oops, class.

Sketch of assignments

  1. Traditional readability: Flesch-Kincaid formula
  2. Lexical complexity: lexical richness, frequency
  3. Syntactic complexity
  4. Psycholinguistic perspective (Ted Gibson’s DLT)
  5. Discourse: aspects of cohesion (connectives, overlap, coreference, …)

Data sources to be used include:


  1. 23./25.10. Detmar: Introduction
  2. 30.10. Zarah: Readability Formulas (no class on 1.11. holiday)
  3. 6./8.11. Detmar: Introduction (cont.)
  4. 13.11. Xiaobin: Aggregating lexical-level complexity information to predict the text-level complexity (Chen & Meurers 2017)
  5. 15.11. Xiaobin: Linking text readability and learner proficiency using linguistic complexity feature vector distance (Chen & Meurers 2019)
  6. 20.11 Tanja Heck: Lexical complexiy (Laufer & Nation 1995)
  7. 22.11. Eva Huber: Morphological complexity (Paquot 2019)
  8. 27.11. Xiaobin Chen, Zarah Weiss: Longitudinal Development of Complexity and Accuracy
  9. 29.11. Haemanth Santhi-Ponnusamy: Syntactic complexity in college-level English writing and L1 differences (Lu & Ai 2015)
  10. 4.12. Elizabeth Bear: The Development of Second Language Writing Complexity in Groups and Individuals: A Longitudinal Learner Corpus Study (Vyatkina 2012)
  11. 6.12. Zarah Weiss: Analyzing linguistic complexity and accuracy in academic language development (Weiss & Meurers 2019a)
  12. 11.12 Hebah Ahmed: Automatic Measurement of Syntactic Complexity Using the Revised Developmental Level Scale (Lu 2009Voss 2005)
  13. 13.12. Daniela Rossman: IPSyn (Sagae et al. 2005Lubetich & Sagae 2014)
  14. 18.12. Jana Murasová: Comparing child L2 development with adult L2 development (Unsworth 2008)
  15. 20.12. Sarah Neuhaus: Readability assessment for aphasia: (Aleligay et al. 2008Abou-Diab et al. 2019) (and Discussion of Christmas break project)
  16. 8.1 Zarah Weiss: Dependency Locality Theory (Gibson 2000Shain et al. 2016)
  17. 10.1 Mareile Winkler: Propositional Idea Density (Brown et al. 2008)
  18. 15.1. Masoumeh Moradipour-tari: Discourse/Cohesion (Graesser et al. 2004)
  19. 17.1 Xiaobin Chen: CTAP (?) and its UIMA architecture
  20. 22.1. Nelly Sagirov: Testing target text fluency
  21. 24.1. Denise Loefflad: Redability analysis for French as a foreign language (François & Fairon 2012)
  22. 29.1. Mohamed Ouji: Evaluation (Huenerfauth et al. 2009van Oosten et al. 2010Van Oosten et al. 2011)
Topics (first sketch: this will develop as the semester proceeds)


