PhD research

These were the research questions of my PhD:

  •   How can the advances in Natural Language Processing and Information Retrieval support an Internet search for so-called linguistically fortified texts (i.e., texts containing a sufficient number of certain linguistic forms)? Do language teachers actually need and appreciate this kind of automatic input enrichment when searching for reading material online?

  •   Can one foster language acquisition and practice by drawing learners’ attention to targeted linguistic forms in text – via automatic visual (e.g., highlighting the form) or functionally-driven (e.g., generating and asking questions about the form) input enhancement, or both? To what extent can research in Automatic Question Generation support this task?

  •   How well can we automatically detect linguistic forms and disambiguate their senses by leveraging rule-based, crowdsourcing, and machine learning approaches?
Learn more about the LEAD Graduate School and my research within it.

Tense Sense Disambiguation

The task is similar to that of word sense disambiguation but focuses on grammatical tenses instead of words.

  •   Michèle and Meike are waiting for Maria. (ongoing action)
  •   Simón and Xiaobin are leaving next week. (future arrangement)

While the predicates of both sentences are instances of the present progressive (continuous) tense, they express different grammatical meanings, or senses.

I approach the task of automatically differentiating between various tense senses with a combination of rule-based and supervised machine learning algorithms. While there is variance in performance of statistical models for different tenses, the best-performing model for the past simple tense achieves an F1 score of 91% and significantly outperforms a strong most-frequent-sense (majority) baseline (p = .01).

Automatic Question Generation ::
language learning context

Chinese retailers have cut staff.

No, this is not an introduction to a post about economics or politics. This is an example sentence from an English textbook (and originally, from a news article). Using this sentence alone, an English teacher could come up with a dozen of ways to facilitate or test their students' knowledge of English. When it comes to teaching, practicing and revising grammar, this sentence can be used to produce the following questions:

  •   Is it important when exactly Chinese retailers cut staff or the fact that cutting staff took place at all?
  •   Chinese retailers _________ staff. (cut)
  •   What is the grammatical tense of the verb 'cut' in the sentence?

Each of the questions above serves a specific goal:

  •   checking understanding of certain grammatical categories in the text
  •   drawing attention to certain grammatical constructions in the text
  •   testing explicit knowledge of grammar

A good teacher sure wants to make the best out of every text their students read. As well as check their homework. And prepare fun communicative activities for the next class. And come up with the topics for the next classroom debate... and much more. Would it not be nice to delegate at least some of this work to a knowledgeable, reliable assistant?

Fortunately, Natural Language Processing techniques and tools make it possible! In our paper titled Question Generation for Language Learning, Detmar Meurers and I discuss the different functions that questions play in language teaching and learning and give examples of how automatic question generation can support those uses.

You can try out our system here. As we only generate questions to grammatical tenses and favor precision over recall, the system does not generate questions to every sentence. Typing in news article texts is probably the best idea since these were our development data. Have fun!

By the way, a crowdsourcing study we conducted (also reported in the paper) showed that it is not that easy for proficient English speakers to tell a question written by an English teacher from a computer-generated one: 67% of automatically generated questions were thought to have been written by a teacher. Of course, there is always room for improvement. We are currently working on the multiple-choice answer format, which requires generation of distractors (multiple-choice options), and exploring the NLP task of Tense Sense Disambiguation in order to improve our algorithm for generating grammar concept questions about grammatical tenses.

I would like to thank my colleagues: Simón Ruiz for putting together and providing a dataset of manually written questions; and Michael Grosz and Johann Jacoby for taking their time to guide me through the dangerous but exciting world of statistical analysis!

FLAIR :: Web Search for Language Learning

FLAIR stands for Form-focused Language-Aware Information Retrieval. It is an online web application for language teachers and learners that helps them find authentic web texts containing a sufficient number of different linguistic constructions studied as part of the English language curriculum, such as the passive voice, grammar tenses, wh- questions, etc.

Try out the FLAIR tool and see for yourself (no registration needed). There you will also find the list of the implemented linguistic constructions as well as the utilized third-party tools and libraries.

An online experiment we conducted with English teachers showed that they prefer FLAIR to a standard search engine when selecting reading materials for their students 71% of the time. Importantly, we found that there is no trade-off between the relevance of content and the richness of linguistic representation in the top FLAIR results.

I would like to thank Professor Detmar Meurers for his supervision and promoting FLAIR in all corners of the world; Madeesh Kannan for optimizing FLAIR; and Ankita Oswal for her work on the online experiment!

SLASH/A :: Ngram Tendency Viewer

Slash/A is a new visualization by me (Maria Chinkina) and Velislava Todorova that allows you to search for and visualize sequences of words, lemmas, parts of speech, or any combination of the three.

Download Slash/A (10,6MB) including the Barrett-Browning corpus

Watch the demos: Part 1 and Part 2

I would like to thank Velislava Todorova for her endless enthusiasm and Dr. Chris Culy for his support and for hosting Slashenka!


To see the most current list of publications, please, visit my Google Scholar profile or ORCID iD iconORCID profile.


Created with Raphaël EDUCATION EMPLOYMENT Doctoral Candidate LEAD Graduate School (current) Research Assistant Universität Tübingen High-School English teacher State School 444 in Moscow Czech language courses Moscow and Prague Online courses in programming and statistics English tutor Moscow and Prague MA in Computational Linguistics Universität Tübingen BA in Linguistics and Teaching Moscow City Pedagogical University 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005

Created with Raphaël PROFESSIONAL INTERESTS Natural Language Processing Second Language Acquisition Computer-Assisted Language Learning Java Python JavaScript jQuery HTML CSS Bootstrap Photoshop Web-based experiments Linguistics & Comp Ling Programming Building Apps for Education Front-end Development Crowdsourcing beginner proficient
Created with Raphaël ProficiencyNativeFull ProfessionalProfessionalLimitedElementaryMap by: Al MacDonald / twitter account @F1LT3R