rational incremental predictive pragmatic processing

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{green}{RGB}{107,142,35} \newcommand{\green}[1]{{\color{green}{#1}}} \] \[ \definecolor{blue}{RGB}{0,0,205} \newcommand{\blue}[1]{{\color{blue}{#1}}} \] \[ \newcommand{\den}[1]{[\![#1]\!]} \] \[ \newcommand{\set}[1]{\{#1\}} \] \[ \newcommand{\tuple}[1]{\langle#1\rangle} \]

\[\newcommand{\States}{{T}}\] \[\newcommand{\state}{{t}}\] \[\newcommand{\Messgs}{{M}}\] \[\newcommand{\messg}{{m}}\]

processing?

incremental & predicitive processing

processing

comprehension of serially presented written or oral language input in context

incrementality

build a syntactic & semantic representations as the sentence comes in
- what's the increment size? fixed or variable?
- how to deal with ambiguity? singular guess or parallel hypotheses?

predicitive

minimal sense: processing behavior is a function of current state
strong(est) sense: comprehender entertains hypotheses about the future

(Kuperberg & Jaeger 2016)

heuristics in syntactic processing

assign a partial parse \(p\) to word sequence \(w_1, \dots w_i\)
as \(w_{i+1}\) comes in update \(p\) to \(p'\) by parsing heuristics
- "The | florist | sent | …"

reducedrelatives

enlightened incrementality (1)

partial interpretation function \([\![ \cdot ]\!] \colon T \mapsto S\), where:
- \(T\) is a (partial) tree and
- \(S\) a (set of) \(\lambda\)-expressions (meanings)
  - think: possible partial compositions of \(T_i\) in a bag
incremental processor as function \(\tuple{T_i, S_i} \mapsto \tuple{T_{i+1}, S_{i+1}}\), where
- transition \(T_i\) to \(T_{i+1}\) given by syntactic parser (modular!)
- \(S_{i+1} \subseteq \den{T_{i+1}}\) based on heuristics, such as:
  - subject-verb heuristic: (compose immediately: "The soup greeted …")
  - temporal adverb + tense: (compose immediatey: "Morgen gewann …")

(Beck & Tiemann 2017, Towards a model of incremental composition, SuB)

enlightened incrementality (2)

delayed composition possible:
- "For two hours the boxer won\(_*\) … "
  - processing difficulty on * in Russian (aspect marking)
  - no processing difficulty on * in German (no aspect marking)
- wait-and-see heuristic

enlightened incrementality: units in the same LF-domain (DP, VP, TP, AspP) are composed immediately

(Beck & Tiemann 2017, Towards a model of incremental composition, SuB)

worries

pippi

too much unexplained:
- precise inventory of the "adaptive toolbox"?
- when to apply conflicting heuristics?
- why these heuristics & not others?
theory follows data
inherent modularity, possibly seriality
- syntax >> semantics >> pragmatics
no place yet for contextual information
- likely speaker-intended discourse contribution
- adaptation to speaker idiocyncrasies
- what's relevant for the listener (think: QUD or task demands)

expectation-based processing accounts

(maximally) predictive interpreter has lexico-syntactic expecations \(P(w_1, \dots, w_n \mid c)\)
- operationalized by corpus frequencies of relevant structures
- possible beam-search approximation
- derived expectation about continuation \(P(w_{i+1}, \dots, w_n \mid w_1, \dots, w_i, c)\)

processing difficulty linked to distance between \(P(\cdot \mid w_1, \dots, w_\red{i}, c)\) and \(P(\cdot \mid w_1, \dots, w_\red{i+1}, c)\)
- self-paced reading times (Smith & Levy 2013)
- various ERP components, notably N400 amplitude (Frank et al. 2015)

(e.g., Jurafsky 1996, Hale 2006, Levy 2008)

RSA goes processing

rational speech act model

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]) \]

Gricean speaker approximates informativity-maximization:

\[ P_{S}(m \mid t) \propto \exp( \lambda P_{LL}(t \mid m)) \]

pragmatic listener uses Bayes' rule to infer likely world states:

\[ P_L(t \mid m ) \propto P(t) \cdot P_S(m \mid t) \]

interpretation as holistic: full & complete utterance

incremental & predicitive interpretation

messages are word sequences: \(\messg = w_1, \dots, w_n\)
initial subsequence of \(\messg\): \(\messg_{\rightarrow i} = w_1, \dots w_i\)
all messages sharing initial subsequence: \(\Messgs(\messg_{\rightarrow i}) = \set{\messg' \in \Messgs \mid \messg'_{\rightarrow i} = \messg_{\rightarrow i}}\)
next-word expectation:

\[P_L(w_{i+1} \mid \messg_{\rightarrow i}) \propto \sum_{\state} P(\state) \ \sum_{\messg' \in \Messgs(\messg_{\rightarrow i}, w_{i+1})} P_S(\messg' \mid \state)\]

interpretation evidence:

\[P_L(\state \mid \messg_{\rightarrow i}) \propto P(\state) \ \sum_{\messg' \in \Messgs(\messg_{\rightarrow i})} P_S(\messg' \mid \state)\]

empirical measures

next-word

self-paced reading
eye-tracked reading
ERPs
…?

interpretation

visual worlds
mouse-tracking
…?

ERP traces of scalar implicature

some EEG studies on some

Noveck & Posada (2013)

ERPs on last word during reading: e.g. "Some people have fins/lungs/pets."
N400 amplitude: "pets" > "lungs"

Nieuwland et al. (2010)

similar to Noveck & Posada
two types of responders based on Autism Spectrum Quotient
- low ASQ: "lungs" > "pets" ; high ASQ: "lungs" \(\approx\) "pets"

Hunt et al. (2013)

sentences with pictorial contexts: controls for lexical associations
truth-value judgement after each sentence
pragmatic resonders' N400: false > underinformative > true & felicitous

experiment 1

participants & procedure

EEG recording of 25 native German speakers
picture (1500ms) -> sentence (500ms per word) -> truth-value judgement

sentence material

"Alle/Einige\(_1\) Punkte sind blau\(_2\), die im Kreis/Quadrat\(_3\) sind"
"All/some of the dots in the circle/square are blue/red"

visual stimuli

stimuli

work by Petra Augurzky

SI computation: algorithmic-level hypotheses

role of context?

SI computed per default vs. when contextually supported

when is SI computed online?

on the word, as soon as NP (possibly) complete, as soon as S (possibly) complete …

when is SI content checked against context?

as soon as possible vs. end of sentence

what incurs processing costs?

SI computation, SI cancellation, SI violation, …

hypotheses about SI mechanics (1)

stimuli

Alle/Einige\(_1\) Punkte sind blau\(_2\), die im Kreis/Quadrat\(_3\) sind

traces of processing effort for "some":
- SI calculation costly,
- SI computed per default, and
- SI calculated immediately on trigger
traces of SI cancellation in contexts \(B\) and \(C\) for "some" if:
- SI calculated (see above)
- SI checked immediately against context

hypotheses about SI mechanics (2)

stimuli

Alle/Einige\(_1\) Punkte sind blau\(_2\) , die im Kreis/Quadrat\(_3\) sind

truth violation in context \(C\) & in contexts \(A\) and \(D\) after "all" (if meaning is composed immediately)
traces of SI calculation in all contexts if:
- SI computed per default, and
- SI calculated immediately on first predicate completion
traces of SI violation in context \(B\) (if SI checked against context immediately)

hypotheses about SI mechanics (3)

stimuli

Alle/Einige\(_1\) Punkte sind blau\(_2\), die im Kreis/Quadrat\(_3\) sind

after "some" in context \(A\): traces of SI cancellation on "square" (if computed before)
after "some" in context \(D\):

RSA's expectation-based processing predictions

general assumptions

listener expects speaker to make a pragmatically felicitous utterance
listener does not give up on speaker rationality on the way (charity, forward induction, …)

specific assumptions

possible meanings \(\States\): pairs of contexts (\(A\) - \(D\)) and speaker-intended shape
possible messages \(\Messgs\): "All/some dots are blue/red that are in the square/circle."
listener knows context, but not shape
speaker chooses description for context and shape

next-word expectations vs. N400

incremental RSA predicts \(P_L(w_{i+1} \mid w_{1,\dots,i}, c)\)
correlating predicted next-word expectations grand-average early N400 (300-400ms):
- \(r= 0.44\), \(p < 0.01\) in total
- \(r = 0.81\), \(p < 0.001\) after exclusion of unexpected continuations

closer look: quantifier position

stimuli

closer look: adjective position

stimuli

closer look: shape noun position

stimuli

further issues

experimental microcosmos

main issue

how to fix reasonable \(\States\) and \(\Messgs\)?

experimental microcosmos assumption

all (and only?) meanings and forms that occur in the experiment

prediction

massive influence of filler material

experiment 2

participants & procedure

EEG recording of 24 native German speakers
picture (1500ms) -> sentence (500ms per word) -> truth-value judgement

sentence material

"Einige\(_1\) Punkte sind blau\(_2\), die im Kreis/Quadrat\(_3\) sind"
"Einige\(_1\) Punkte sind blau\(_2\)"

visual stimuli

stimuli

work by Petra Augurzky

results

behavioral data

only one participant consistently gave pragmatic judgements

ERP responses

no trace of pragmatic infelicity / expectations

conclusions

incremental RSA seems feasible:
- next-word expecations & accummulated interpretation evidence
- genuine pragmatic expectations beyond / on top of lexico-pragmatic expectations

main challenges:
- link functions for interesting experimental measures
- how to fix \(\States\) and \(\Messgs\) in general?
- how to scale up to more open-ended applications? tradeoff lexico-syntactiv vs. pragmatic expectations?