principal investigator, DFG Emmy Noether research group "Modal systems in the historical Slavic languages"
associate member of the DFG Center for Advanced Study "Words, Bones, Genes and Tools"
University of Tübingen
Seminar für Sprachwissenschaft
72074 Tübingen, Germany
igor.yanovich at uni-tuebingen dot de
Curriculum Vitae (pdf)
I am a computational historical linguist, with interests in evolutionary approaches to language change, linguistic phylogenetics, semantics and pragmatics, and interdisciplinary collaborations for uncovering the multifaceted past of human communities. I apply population-genetic methods to the modeling of language change, do computational phylogenetic analyses and study their reliability (and have been an associate at the EVOLAEMP project), use evolutionary game theory, and am exploring ways to apply model-based historical inference methods to dialectal data, which to my knowledge has never been done before.
I am currently leading an Emmy Noether research group "Modal systems in the historical Slavic languages: formal semantics, micro-variation, language change and language contact", funded by the German Research Foundation and hosted at the University of Tübingen.
I want computational methods to become better integrated into basic linguistic research. We are experiencing a shift towards greater amounts of available data across all subfields in linguistics, and it is important to keep up with that trend by importing old and developing new techniques for data analysis, computer simulation, and computational modeling. In my research, using formal and computational methods has helped many times. Through my teaching, I try to help students acquire a wide toolkit of mathematical methods to help in their work.
In formal semantics and pragmatics, I am currently concerned with how ambiguity functions on the grammatical side of language. There are two problem areas I am investigating. First, some patterns of historical development (including changes experienced under language contact) of ambiguous morphemes and grammatical words such as modals strongly suggest that the standard logical analysis of ambiguity is insufficient. Under the standard model, we account for ambiguity by simply postulating several alternative meanings for the same item. However, the data suggest that in the speakers' minds, there are also non-trivial second-order connections between some of the meanings which go beyond their truth-conditional contribution. The second problem area is the functioning of ambiguous items in a community of speakers, with a particular focus on how the range of their meanings may change over time. To make progress in both problem areas, more fine-grained data on the actual use of ambiguous items are needed. I am working on collecting such data from historical texts and corpora for historical Slavic modals and future markers, the development of the progressive in English, and on English subject to. On the theoretical side, my work grounded in mathematical population genetics and evolutionary game theory is helpful for modeling language change on the community level.
My previous work in semantics includes a series of studies on modality: a new contextualist analysis of epistemic modality, and argued for recognizing new types of modals: symbouletic modals of suggestion, and a "collapse variable-force" modal in Old English. I have also worked on the expressive power of backwards-looking operators like "now", de re attitudes, gender presuppositions of anaphoric pronouns, and indefinites.
Sometimes I also do phonology, which led to new results in mathematical Optimality Theory, as well as finding out together with Donca Steriade (using Ukrainian and Russian data) that Base Priority effects also work within inflectional paradigms.
Selected papers are described below, grouped by topic. The CV contains the full list.
The talk reports the results of this ongoing project, discussing data from Modern Russian and Old Ukrainian.
[Sicoli and Holton, 2014] (PLOS ONE 9:3, e91722) use computational phylogenetics to argue that linguistic data from the putative, but likely Dene-Yeniseian macro-family are better compatible with a homeland in Beringia (i.e. northeastern Siberia plus northwestern Alaska) than with one in central Siberia or deeper Asia. I show that a more careful examination invalidates that conclusion: in fact, linguistic data do not support Beringia as the homeland. In the course of showing that, I discuss, without requiring a deep mathematical background, a number of methodological issues concerning computational phylogenetic analyses of linguistic data and drawing inferences from them. I conclude with a brief overview of the current evidence bearing on the Dene-Yeniseian homeland from linguistics, archaeology, folklore studies and genetics, and suggest current best practices for linguistic phylogenetics which would have helped to avoid some of the problems in Sicoli and Holton’s Dene-Yeniseian study.
Sapir's drift, in the sense of parallel independent development, and apparent exceptions to unidirectionality in generally orderly change processes such as grammaticalization, are two patterns problematic enough conceptually that there have been attempts in the literature to deny their existence. I set up a reasonably realistic stochastic framework for analyzing time-courses of language change, and show that both Sapir's drift and unidirectionality with exceptions are actually expected to arise. The uttered tokens of the old and new linguistic variants are analyzed analogous to alleles at the same locus in population genetics, copied under possible reanalysis and innovation (i.e. mutation) and differential proneness to be repeated by speakers (i.e. fitness). In multi-speaker networks, the finite size of speaker memory creates genetic drift, which in turn causes Sapir's drift. Linguistic mutations usually allow exact back-mutations, which predicts exceptions to predominant unidirectionality. Finally, some crucial evolutionary parameters of the model have little dependence on the speaker-network structure, opening the road to practical inference from empirical data on language change.
Lexical datasets used for computational phylogenetic inference suffer a unique type of data error. Some words actually present in a language may be absent from the dataset at no fault of its curators: especially for lesser-studied languages, a word may be missing from all available sources such as dictionaries. It is thus important to be able to (i) check how robust one’s inferences are to dictionary omission errors, and (ii) incorporate the knowledge that such errors may be present into one’s inference. I introduce two simple techniques that work towards those goals, and study the possible effects of dictionary omission errors in two real-life case studies on the Lezgian and Uralic datasets from Kassian (2015) and Syrjänen et al. (2013), respectively. The effects of dictionary omission turn out to be moderate (Lezgian) to negligible (Uralic), and certainly far less significant than the possible effects of modeling choices, including priors, on the inferred phylogeny, as demonstrated in the Uralic case study. Assessing the possible effects of dictionary omissions is advisable, but severe problems are unlikely. Collecting significantly larger lexical datasets, in order to overcome sensitivity to priors, is likely more important than expending resources on verifying data against dictionary omissions.
Deo 2015 is the first study applying mathematically explicit evolutionary analysis to a specific semantic-change phenomenon, namely the progressive-imperfective diachronic cycle. However, Deo’s actual results do not match completely the empirical observations about that cycle. Linguistic communities passing through the cycle often employ, in the synchrony, a single common type of progressive-imperfective grammar. In Deo’s modeling results, however, two of the grammars never get shared by nearly all the population, including the grammar with the obligatory use of progressive marking in semantically progressive contexts, as in Present-Day English. This paper improves on that wrong prediction. The crucial modeling decision enabling the improvement is switching from the assumption of infinite speaker population to the more realistic, but harder to analyze finite population setting. The finite-population version of Deo’s model derives stages where at many time points, all or almost all speakers share the same grammar. Interestingly, two different a priori reasonable types of trajectories with that feature emerge, depending on the parameter settings. These two trajectory types constitute novel empirical predictions regarding the shape of the cycle generated by (the proposed extension of) Deo’s model.
Here one can find a much longer companion technical report, together with reusable R scripts, aimed at those who would like to engage in such modeling themselves or to simply replicate the findings.
The use of the progressive in English has sharply risen during the 19th century, but how did it happen? We examine that process using the Hansard archive of the British Parliament, an order of magnitude larger than previous corpus studies of the issue employed. It turns out that individual verbs had very different histories: some showed sustained rise in the share of progressive forms, while others hardly budged, or even decreased their progressive instances. No obvious semantic features explain the differences consistently. It thus appears that the histories of individual words can be significantly different in this change.
The may construction as in I hope we may be on such terms twenty years hence is quite peculiar: the modal in it lacks its usual existential quantificational force (see Portner (1997)). Despite its archaic feel, a corpus investigation has shown that it is quite a recent innovation, and never was particularly frequent. I hypothesize that the special properties of may under hope may stem from it arising as a replacement of the subjunctive in the particular class of semi-formulaic hopes regarding good health. This would explain both the timing of the construction's rise and its elevated and formal flavor. But there is currently not enough data to rigorously test this hypothesis, due to lack of suitable evidence on the very first stages of the construction's development in Early Modern English.
Both in 16th century and in late 19th century Ukrainian, maty 'have' allows three types of modal/temporal uses: necessity, futurate, and possibility. The modal is genuinely ambiguous between those three, unlike true variable-force modals of the Pacific Northwest, and has no parallels described in the semantic literature. This is an unusual triple combination, apparently not matched by other Slavic languages: many of them also show cognate 'have' developing obligation and/or futurate readings, but not the possibility reading, with a potential exception of early Polish. Simple modeling analysis with iterative grammar learning shows that this is not an unexpected situation: a system with such unusual ambiguity as in Old Ukrainian is predicted to arise rarely, but be maintained easily.
In English, there is seemingly evidence for distinguishing "strong necessity" modals must and have to from "weak necessity" modals ought and should: the two groups of modals behave differently in several types of tests. I show that despite that, there is actually no such category as strength of necessity, and explain the observations appealing to basic properties of modals such as their modal flavor. My argument is based on three types of data. (i) In US court decisions, performative should carries the same strong force as must. (ii) In Russian and Ukrainian, the "weak necessity tests" that neatly divide English modals into two groups, fail to produce complementary distributions. (iii) In a pilot investigation of a parallel Bible corpus with English, German, French, Ukrainian, Russian, Czech and Bulgarian texts, English modals must vs. should and ought fail to correspond to categories of other languages.
I describe a previously unnoticed phenomenon of Russian syntax: the ability of certain predicative adjectives with infinitival complements to appear in a pre-copula position. The resulting structure resembles the so-called "long head movement" structures of the West and South Slavic. A crucial feature of the Russian pattern is that different predicative adjectives appear in the Adj-Aux-Inf order with different frequency: "more grammatical" adjectives almost always appear with the Adj-Aux inverted order, while other adjectives vary. For example, dolzhna 'must.ADJ' appears as Adj-Aux >1000 times in the 230M-words written portion of the Russian National Corpus, and only 1 time as Aux-Adj; the order of distinction is similar for the spoken subcorpus of 11M words. But "less grammatical" gotova 'ready' appears as Adj-Aux in 9.8% of written, and 57.9% of spoken instances. This distribution thus reflects the existence of a proto-category in Russian: adjectives like dolzhna do not currently form a syntactic category as such, as they still permit the dispreferred order in judgements, but there is apparently potential for the development of a new true syntactic category. I am now looking into the diachrony of this construction.
The standard account of O(ld) E(nglish) *motan and M(iddle) E(nglish) *moten, the ancestors of Modern English must, is that those modal words were ambiguous between a possibility and a necessity readings until in the Early Modern English period the possibility uses disappeared. I argue instead that in the Alfredian OE prose, *motan was an unambiguous, but "variable-force" modal, somewhat similar to the recently discovered variable-force modals of the American Pacific Northwest. I argue that in Alfredian OE, motan(p) presupposed that if p gets a chance to actualize, it will. However, for the ME *moten, I show that in the Early Middle English ‘AB’ dialect, *moten was genuinely ambiguous between possibility and necessity, and thus the standard analysis for it is essentially correct. Thus a new trajectory of semantic change is discovered: a variable-force modal may develop into a modal genuinely ambiguous between possibility and necessity, and then turn into a regular necessity modal.
In this paper, I argue for distinguishing a novel modal flavor of symbouletic modality (< συμβoυλευω ‘advise’) — the modality of advice and suggestion. As symbouletic modality is often expressed by modals that also have deontic or teleological readings, it is not trivial to es- tablish its special character. Fortunately, Russian features modal stoit that is specialized for advice/suggestion uses. Using stoit as a convenient testing case, I propose a formal analysis of symbouletics within the framework for performative verbs and imperatives by Condoravdi and Lauer (2012), analyzing symbouletics as a kind of performatives.
In this short reply to Braun (2013), I provide an argument against his invariantist theory of might. Invariantism proposed by Braun aims to maintain full identity of semantic content between all uses of might. I invoke well-known facts regarding diachronic change in meanings of modals to argue that the invariantist view commits us to implausible duplication of familiar processes of lexical semantic change on the level of “lexical pragmatics”, with no obvious payoff.
This is an investigation into the expressive power over models that backwards-looking operators such as "now", "then" and "actually" add to the basic modal language. Contrary to popular belief in linguistics and philosophy, such operators do not push expressive power up to the level of explicit quantification over times and worlds.
(2011) How much expressive power is needed for natural language temporal indexicality? in Proceedings of WoLLIC 2011, LNAI 6642, Springer, pp. 293-309. doi:10.1007/978-3-642-20920-8_27.
Superceded by "Expressive power of “now” and “then” operators".
(2010) Evaluation tree languages, abstract published in The Bulletin of Symbolic Logic, 17:2 (2011), p. 323. [presentation]
A stab at an analysis of context-sensitive modal operators, grown from earlier work on indexical presuppositions. Certain themes have been picked up in later work on backwards-looking operators, with much better technical precision.
Talk at the 2010 Logic Colloquium in Paris.
(2011) The problem of counterfactual de re attitudes in Proceedings of SALT 21, pp. 56–75.
I argue that counterfactual de re presents no analytical challenge (contra Ninan 2008) once it is analyzed as parasitic on beliefs. As an extra, a simple compositional way to derive de re readings is defined (see Sec. 2.3), not using covert concept generators, the movement of the res, or other syntactic entities not motivated by syntactic phenomena.
(2012) Indexical presuppositions of pronominal gender features [current version (July 2012)]
The paper shows that gender features on anaphoric pronouns trigger not regular presuppositions, but a peculiar set of restrictions on the context. Moreover, those restrictions only arise when pronouns refer to humans (which is harder to show in English, but is easy to observe in, e.g., Russian). I argue that the "presuppositions" of gender features are thus not a part of the compositional semantics, but instead the effects of a pragmatic rule of use for pronouns referring to humans.
(2012) What can Russian gender tell about the semantics of phi-features? (presented at FASL 21, Indiana University, May 2012) [presentation]
A conference-format presentation of the ideas here, with focus on data from Russian.
(2009) On the nature and formal analysis of indexical presuppositions, in Proceedings of LENLS 2009, LNAI 6284, Springer, pp. 272-291.
Superseded by this. The paper describes the peculiar projection pattern associated with gender features on English anaphoric pronouns, discusses the initial observations by Cooper (1983) capturing a part of that pattern, but forgotten in the later literature, and explains why conventional LF-semantics devices such as copying the antecedent's features or introducing special binding-theoretic rules for pronominal world variables cannot derive the observed facts.
The paper discusses the constraints on intermediate scope readings of some and a certain indefinites, and proposes a presuppositional analysis of those determiners.
The paper was presented at Sinn und Bedeutung 2007 in Oslo.
A short note discussing the semantics of Russian indefinites based on the adjectival root kakoj 'which'. Such indefinites are argued to be ambiguous between two readings, so that, e.g., kakoj-to is sometimes roughly synonymous with English 'some', and other times, with English 'of some kind'.
The paper proposes a choice-functional analysis of Russian indefinite series formed with -to and -nibud, and argues for a Hamblin analysis of bare indefinite roots in Russian.
Supersedes the previous two works, developing and sharpening the idea that the cycle has access to a pool of forms generated earlier rather than a single base form. In the new theoretical setup, the directionality of the cycle is preserved, but its reliance on unique representations at each stage is rejected.
Standard cyclic effects in derivational morphology involve inheritance of properties of the base by the derived form. Using data from Ukrainian and Russian, we argue that the base word provides not just a single form, but rather all the forms in its paradigm for the derived item to choose from.
Stress patterns within nominal inflectional paradigms in East Slavic are notoriously complex, and in the case of Ukrainian and Belarussian, still understudied in the theoretical literature. We analyze stress in Ukrainian nominal paradigms, and argue that everything falls into place once we accept that the singular subparadigm serves as a collective base for the plural subparadigm. Thus, effects of Base Priority may be observed not only in derivational morphology, but also within inflectional paradigms.
Many OT tableaux convey the same information about rankings, and operations preserving that information have been studied in Hayes (1997), Brasoveanu and Prince (2005), Prince (2006), a.o. The paper defines a functionally complete set of elementary information-preserving transformations, and provides a computable test for OT tableau equivalence via normal form techniques. One corollary of the results is that Brasoveanu and Prince's Skeletal Basis tableaux are unique not just for a single tableau, but for whole equivalence classes of comparative tableaux as well.
The paper develops the theory of sets of rankings compatible with a particular tableau, and provides methods for computing such sets for arbitrary tableaux, allowing for lossless and monotonic incremental OT learning.
Superseded by the two papers above.
What is good in the dissertation would not have existed without my MIT teachers and advisors, who helped me become the researcher I am today. The direct influence of my dissertation advisors Kai von Fintel, Irene Heim and Sabine Iatridou on the text would be evident for the reader. But beyond the dissertation advising, without their help and support over all my five years at MIT, I would not have grown into a linguist I am now. It is very much thanks to Kai, Irene and Sabine that I learned to be guided by empirical data, be ready to see beyond my current theoretical convictions, and also to be cautious when drawing theoretical inferences from observed data. I am also very grateful to my other MIT teachers, especially to Adam Albright, who served as my registration advisor for all my time at MIT; Donca Steriade, who infected me with excitement about phonology; and Martin Hackl, who taught me a lot about what a teacher needs to know.