About me

in five sentences

I am a computational linguist and data scientist. Currently, I am doing my Phd in the Department of Linguistics at the University of Tübingen where I work as a lecturer and researcher. I am interested in learning analytics and linguistic and statistical aspects of language and data modeling, in particular with regard to language complexity, L2 and L1 proficiency and development in writing and speech, task effects, and text readability.

Aside from being involved in a series of smaller projects on linguistic complexity assessment for German, I work in the COLD project where we assess competencies of teachers in linguistically diverse German as a second language classrooms. I also work in the KANSAS project, where we develop, test, and evaluate a language-sensitive, competence- and user-oriented search engine for adult literacy training and a secondary focus on L1 and L2 learning contexts.


I started my Phd in Computational Linguistics in the ICALL research group of the Department of Linguistics at the University of Tübingen in October 2017.

I acquired my Master’s degree in computational linguistics in August 2017 at the University of Tübingen, where I studied in the Master’s program from winter term 2015/16 to summer term 2017.

Before, I completed my Bachelor's degree in computational linguistics, which I also studied at the University of Tübingen from winter term 2012/13 to summer term 2015 with general linguistics as minor. Furthermore, I graduated with a Bachelor's degree in German studies focusing on German linguistics and medieval German. I studied German studies from summer term 2012 to summer term 2015, first in Frankfurt (Main), then as a double major with computational linguistics at the University of Tübingen.

Work & Research Experience

As of winter term 2020/21, I teach as a lecturer for computational linguistics at the department of general linguistics in Tübingen and function as advisor for the Erasmus program of the department of general linguistics.

Since mid of April 2019, I am a researcher at the COLD project, where I investigate speech interaction of teachers and students in German classrooms.

Since October 2017, I am a researcher at the KANSAS project, where I am primarily focusing on the assessment of reading material for illiteracy classes as well as on the development of the language-sensitive, competence- and user-oriented search engine.

From October 2015 to September 2018, I was a researcher at the LangBank project, where I have primarily been developing a pipeline for the automatic analysis of language and complexity measures for Early New High German.

I gained previous research experience in my work as student assistant: First, from July 2012 to December 2013 at the GermaNet project, where I helped extending the German word net data base. Then, from October 2014 to July 2015 at the A4: Comparing Meaning in Context project, where I was responsible for focus annotations and some programming.

Also, from July to September 2014, I had an internship at IBM ExtremeBlue, where I did some beta testing on a new script language for workflow design.

I had the chance to gain some teaching experience in the summer term 2015, where I was a tutor for Grammar Formalisms, a seminar held by Detmar Meurers.

Peer-Reviewed Articles

  • Nadezda Okinina, Jennifer-Carmen Frey, and Zarah Weiss (2020): CTAP for Italian: Integrating Components for the Analysis of Italian into a Multilingual Linguistic Complexity Analysis Tool. In: Proceedings of The 12th Language Resources and Evaluation Conference. Marseille, France, May, 2020, pp. 7123-7131. [Article]
  • Zarah Weiss and Detmar Meurers (2019): Broad Linguistic Modeling is Beneficial for German L2 Proficiency Assessment. In Widening the Scope of Learner Corpus Research. Selected Papers from the 4th Learner Corpus Research Conference 2017, Andrea Abel, Aivars Glaznieks, Verena Lyding, Lionel Nicolas (eds.), Louvain: Presses Universitaires de Louvain, pp. 419-435. [Article]
  • Sabrina Dittrich, Zarah Weiss, Hannes Schröter, and Detmar Meurers (2019): Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education. In: Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning. Turku, Finland, September, 2019, pp. 41-56. [Article] [Slides]
  • Zarah Weiss, Anja Riemenschneider, Pauline Schröter, and Detmar Meurers (2019): Computationally Modeling the Impact of Task-Appropriate Language Complexity and Accuracy on Human Grading of German Essays. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Florence, Italy, August, 2019, pp. 30–45. [Article] [Slides]
  • Zarah Weiss and Detmar Meurers (2019): Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary School. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Florence, Italy, August, 2019, pp. 380–393. [Article] [Poster]
  • Christoph Kühberger, Christoph Bramann, Zarah Weiss, and Detmar Meruers (2019): Task Complexity in History Textbooks: A Multidisciplinary Case Study on Triangulation in History Education Research. In: History Education Research Journal 16.1, pp. 139-157. [Article]
  • Zarah Weiss, Sabrina Dittrich, and Detmar Meurers (2018): A Linguistically-Informed Search Engine to Identify Reading Material for Functional Illiteracy Classes. In: Proceedings of the 7th Workshop on NLP4 for Computer Assisted Language Learning at SLTC 2018 (NLP4CALL 2018). Stockholm, Sweden, November 7, 2018, pp. 79-90. [Article] [Poster]
  • Zarah Weiss and Detmar Meurers (2018): Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New Mexico, USA, August 20-26, 2018, pp. 303 - 317. [Article] [Poster]
  • Heiko Holz, Zarah Weiss, Oliver Brehm, and Detmar Meurers (2018): COAST - Customizable Online Syllable Enhancement in Texts: A flexible framework for automatically enhancing reading materials. In: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications. New Orleans, Lousiana, USA, June 5, 2018, pp. 89 - 100. [Article] [Poster]
  • Doreen Bryant, Karin Berendes, Detmar Meurers and Zarah Weiss (2017): Schulbuchtexte der Sekundarstufe auf dem linguistischen Prüfstand. Analyse der bildungs- und fachsprachlichen Komplexität in Abhängigkeit von Schultyp und Jahrgangsstufe. In: Linguistische Komplexität – ein Phantom?, Mathilde Hennig (ed.), Tübingen: Stauffenburg Verlag, pp. 281 - 309.

Invited Talks and Presentations

(talks at events with proceedings are not listed here, cf. the list of peer-reviewed papers instead)

  • Zarah Weiss (2019): Broad Linguistic Modeling of German L2 Complexity using Measures of Linguistic Complexity, Language Use, and Human Processing. In: Colloquium on Broadening the Scope of L2 Complexity Research. Workshop Talk. Brussel, Belgium, November, 2019. [Slides]
  • Zarah Weiss (2019): A Sentence is a Sentence is a Sentence? Parallels and Differences between the Segmentation of Oral and Historical Language Data. In: Unit segmentation in Spoken Language. Invited Workshop Talk. Orléans, France, June, 2019. [Slides]
  • Zarah Weiss and Detmar Meurers: Broad Linguistic Modeling is Beneficial for German L2 Proficiency Assessment. In: 4th Learner Corpus Research Conference. Conference Talk. Bozen, Italy, October 2017. [Slides]
  • Zarah Weiss: Using Measures of Linguistic Complexity to Assess German L2 Proficiency in Learner Corpora under Consideration of Task-Effects. In: Kolloquium Korpuslinguistik und Phonetik, Humboldt Universität zu Berlin. Invited Talk. Berlin, Germany, June 2017. [Slides]
  • Zarah Weiss, Detmar Meurers, Anke Lüdeling, Uwe Springmann, Brian MacWhinney and John Kowalski: A Digital Infrastructure to Study Latin and Historical German. In: 39. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Poster. Saarbrücken, Germany, March 2017.
  • Zarah Weiss and Gohar Schnelle: Annotation of an Early New High German Corpus: The LangBank Pipeline. In: 39. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, AG 4: Encoding language and linguistic information in historical corpora. Workshop Talk. Saarbrücken, Germany, March 2017. [Abstract] [Slides]
  • Detmar Meurers, Josef Schrader, Hannes Schröter and Zarah Weiss: Sprachaffine Suchmaschine zur Unterstützung von Lehrkräften. In: Workshop "Spracherwerb und Sprachförderung über die Lebensspanne II". Workshop Talk. Köln, Germany, November 2016.
  • Christiane Bertram and Zarah Weiss: Computer-Based Evaluation of Student Texts. In: On the Way to an Internationally Shared Assessment of Historical Thinking. Workshop Talk. Hamburg, Germany, July 2016.

Unpublished Resources & Theses

Unpublished resources
  • Zarah Weiss: Feature documentation of the Tübingen German Complexity Code. Version 1.0. [EN]
  • Zarah Weiss and Theresa Geppert (2018): Textlesbarkeit für Alpha-Levels. Annotationsrichtlinien für Lesetexte. Version 1.1. [DE]
  • Zarah Weiss and Gohar Schnelle (2016): Early New High German Sentence Segmentation Annotation Guidelines. Version 4.0. [DE] [EN]
  • Zarah Weiss (2017): Using Measures of Linguistic Complexity to Assess German L2 Proficiency in Learner Corpora under Consideration of Task-Effects. MA thesis in computational linguistics. Eberhard Karls Universität Tübingen. [Thesis] [Supplementary Material]
  • Zarah Weiss (2015): More Linguistically Motivated Features of Language Complexity in Readability Classification of German Textbooks: Implementation and Evaluation. BA thesis in computational linguistics. Eberhard Karls Universität Tübingen. [Thesis]
  • Zarah Weiss (2015): Untersuchung partieller Subjekt-Verb-Kongruenz bei disjungierten Singularen im Deutschen. BA thesis in German studies. Eberhard Karls Universität Tübingen. [Thesis]

Teaching & Workshops

Winter 2020/21
  • HS Linguistic Corpus Annotation.
Winter 2019/20


My office is in the department of linguistics at

Seminar für Sprachwissenschaft
Wilhelmstraße 19, room 1.31
72074 Tübingen

Virtual office hours: Friday 09:00 to 10:15 (during the semester) or by appointment.
Please contact me in advance via email to reserve a time slot.