Projects | University of Tübingen

As Quantitative Linguistics group there are several projects we are currently working on. A short description of the projects associated with our group follows. For a more detailed view of the research ideas we are investigating look at Harald Baayens website. Projects that have been finished or run out of funding can be found in Previous Projects.

DFG-Cwic

Complex words in context

Principal Investigator: R. Harald Baayen (Professor of Quantitative Linguistics)

Website

Details to DFG-Cwic

Project Cwic: Complex words in context

Recent years have seen impressive advances in the fields of natural language processing (NLP) and artificial intelligence (AI). State-of-the-art language technologies have been made possible by advances in machine learning utilising many-layered 'deep' learning artificial neural networks. However, understanding what deep learning networks detect in language use, and what probabilistic information they exploit to generate predictions for computational language tasks, often remains unclear (but see Linzen & Baroni, 2021, for recent advances). For engineering purposes, this is not a problem, but for understanding language and the cognition of language processing, this state of affairs is highly unsatisfactory. The discriminative lexicon model (DLM) (Baayen, R. H. et al., 2019; Chuang & Baayen, R. H., 2021) is an attempt to combine the strengths of the mathematics of error-driven learning with the new possibilities offered by word embeddings for the computational modeling of the mental lexicon and lexical processing. Word embeddings, which we will also refer to as 'semantic vectors', represent word meanings as points in a high-dimensional space calculated from word usage in large text corpora.

Members

R. Harald Baayen (Principal Investigator)
Konstantin Sering (Postdoctoral researcher)

ERC-SUBLIMINAL

Subliminal learning in the Mandarin lexicon

Principal Investigator: R. Harald Baayen (Professor of Quantitative Linguistics)

Website

Details to ERC-SUBLIMINAL

Project aims

Central to this research project is the observation that there are regularities and systematicities in the spoken language that escape our awareness, that are shielded from us by linguistic traditions and cultural conventions embodied in writing systems, but that nevertheless are detected by our brains, albeit subliminally, and used to optimize lexical processing.

Philosophers such as Emmanual Kant, Edmund Husserl, and Maurice Merleau-Ponty, and more recently the cognitive scientist Hoffman, have called attention to how our perception of reality is shaped by and filtered through our minds and bodies. According to Hoffman, mathematically, fitness beats truth: our perceptions of the world are tuned to our survival. Writing systems are culturally evolved technologies that also hide from our eyes and ears the truth about what we really hear and say. Obviously, in order to work, writing systems must abstract away from the full richness of the spoken word. However, many features of our speech that are masked by writing systems, are nevertheless exploited by our cognitive system when we listen or speak. For native speakers, mismatches between speech and writing are relatively unproblematic. For second language acquisition, however, mismatches can render learning unnecessarily difficult.

The research programme addresses this issue for Mandarin Chinese. Two kinds of mismatches will be investigated, using state-of-the-art methods in computational modeling, distributional semantics, and statistical analysis: subliminal mismatches between what written words are supposed to sound like, and how they are actually spoken, and subliminal mismatches between how the writing system is supposed to work, and how it actually functions and, as a semiotic system of its own, influences thought. These investigations will inform the applied goal of this project: developing ways to enhance vocabulary learning of Mandarin Chinese as a second language.

Presentations

Baayen, R. H., Modeling Mandarin tones on two-word compounds, Colloquium English Language and Linguistics, Düsseldorf, Germany, January 19, 2024.

Baayen, R. H., Frequency-Informed Learning, colloquium Out of Our Minds, Birmingham, United Kingdom, October 11, 2023.

Baayen, R. H., Computational modeling of lexical processing, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 7, 2023.

Yang, Y., Measure words in Mandarin, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023

Jin, X., Retroflex realization in the ShangHai dialect, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023

Tseng, Y.-H., Lian, D.-C., and Watty, D., Modeling diachronic semantic change of (Pre-Modern) Mandarin Chinese with contextualized embeddings & Word2Vec, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023

Chuang, Y.-Y., Baayen, R. H., and Bell, M., Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English, 20^th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic, August 7, 2023 (poster presentation).

Baayen, R. H., Chuang, Y.-Y., and Heitmeier, M., Discriminative learning and the lexicon: NDL and LDL, STEP2023 – CCP Spring Training in Experimental Psycholinguistics, Edmonton, Canada, June 14, 16, 2023 (virtual).

Members

R. Harald Baayen (Professor, Principal Investigator)
Yu-Ying Chuang (Postdoctoral researcher)
Xiaoyun Jun (Doctoral researcher)
Yuxin Lu (Doctoral researcher)
Kun Sun (Postdoctoral researcher)
Yu Hsiang Tseng (Postdoctoral researcher)
Weiting Wang (Research assistant)
Yi Yang (Postdoctoral researcher)
Runzhi Zhang (Research assistant)

DFG-EML

Machine Learning for Science

Cluster of Excellence - Machine Learning for Science (Cluster speaker: Philipp Berens, Cluster speaker: Ulrike von Luxburg)

Website

Details to DFG-EML

Innovation Fund Project 1 in research area A - Beyond Prediction, Towards Understanding

In research area A, we will design algorithms that reveal complex structure and causal relationships from data in order to integrate machine learning into the scientific discovery process. Project 1 investigates "Enhancing Machine Learning of Lexical Semantics with Image Mining".

Members

Hendrik Lensch (Principal investigator)
R. Harald Baayen (Principal investigator)
Zohreh Ghaderi (Phd student)
Hassan Shahmohammadi (Phd student)