Johannes Dellert

Home

Welcome to my website! I am a computational linguist who currently works as a postdoctoral researcher in the CrossLingference project at the Department of Linguistics in Tübingen. My current research focuses on the exploration of existing and new methods for the automated reconstruction of proto-forms from phonetic data, and the development of new interactive tools for machine-assisted historical linguistics. I also like to work on improving the data landscape for this type of research, chiefly in my role as the main contributor and coordinator for the NorthEuraLex database, and am always interested in collaborating with historical linguists on problems where computational or statistical support could prove beneficial.

Since I completed my dissertation in 2017, I have held various positions as a lecturer and researcher at the department. I taught courses in computational linguistics while working on a novel approach to the interpretation of non-standard language as part of Prof. Detmar Meurers' ICALL research group, expanded on the results of my PhD work in Prof. Gerhard Jäger's group, and spent ten months working on interdisciplinary collaborations as a fellow at the DFG Center for Advanced Studies "Words, Bones, Genes, Tools".

Before that, I was a doctoral student in the EVOLAEMP project, under the supervision of Prof. Gerhard Jäger. For my PhD project, I worked on the application of causal inference methods to problems of historical linguistics.

Before the PhD, I completed M.A. and B.A. degrees in the International Studies in Computational Linguistics program at the University of Tübingen. In parallel, I acquired a Diplom (M.S.) degree in computer science at the Department of Computer Science in Tübingen, with a minor in mathematics.

Current Research Interests:

Automated Reconstruction of Proto-Languages: My current main project is to carefully evaluate and compare existing approaches to proto-word reconstruction from phonetic data. The results will be used for the development of new hierarchical Bayesian models, which will allow a more principled treatment of uncertainty in the output of reconstruction methods, and will make the inclusion of additional domain knowledge (such as known sound laws) much more straightfoward.

Machine-Assisted Historical Linguistics: Together with a team of student programmers, I have been working on EtInEn (Etymological Inference Engine), a comprehensive software suite for machine-assisted historical linguistics. It provides a fully integrated environment for importing and preprocessing lexical data, morphological analysis, grouping morphemes into cognate sets, establishing sound correspondences and sound laws, detecting loanwords, and performing phonetic reconstruction. It includes state-of-the-art methods and resources for automating many of these tasks, and we are working towards a system which can suggest e.g. additional potential cognates or sound laws that will be consistent with the already established parts of a theory, allowing the user to efficiently explore ideas and flesh out theories in dialogue with a system that automates many routine tasks.

Statistical Relational Learning: Many of the reasoning components in EtInEn are developed in Probabilistic Soft Logic (PSL), a framework which combines first-order logic with probabilistic graphical models into a very flexible modeling language for relational domains. The challenges in the development of our reasoning components have led to the creation of new methods and tools for a more flexible instantiation of PSL problems, running them as background tasks on worker threads, and making the reasoning transparent by means of an interactive browser for inspecting the rule-atom graph which underlies a grounded PSL problem.

Older Research Interests:

Causal Inference: Recent mathematical models allow to infer causal knowledge from non-experimental data on the basis of conditional independencies between observed variables. I have explored the use of such methods for detecting causal patterns in linguistic data. Within the EVOLAEMP project, the primary focus of this work was on using causal inference to infer what I call lexical flow networks from automatically inferred cognate sets.

Interpreting Non-Standard Language: I am interested in exploring new approaches to natural language understanding in situations where higher levels of linguistic analysis (like semantics) are needed to guide the interpretation on lower levels of analysis (like morphology), which is the case whenever non-standard usage is involved. My main motivating example of such a situation has been the automated interpretation and analysis of German learner answers.

Computational Semantics: My previous main area of interest within computational linguistics, where I have been involved in developing specialized reasoning tools which better meet the demands of linguistic applications. This especially concerns the area of model generation, where I have done some work on specialized heuristics for more rapid construction of linguistically adequate models.

Automated Reasoning: Beyond my interest in reasoning tools for computational semantics, I have also been doing work on the extraction of Minimal Unsatisfiable Subsets from unsatisfiable SAT problems. This has applications in many areas where minimal explanations for observations are needed.

Grammar Engineering: Systems for implementing symbolic grammars in complex grammar formalisms are very helpful tools for validating linguistic theories. I have been involved in developing and extending environments for the HPSG and TAG formalisms with the goal of making their behavior more transparent.