Bayesian data analysis & cognitive modeling

overview

theoretical & experimental pragmatics
- natural language quantifiers
- typicality of quantifier \(\textit{some}\)
generalized linear model
- types of dependent variables
- predictors & link functions
probabilistic model
- gradient salience of alternative expressions
- one predictor feeds two link functions

natural language quantifiers

dummy

penguinLogic

natural language quantifiers

some examples:

\(\textit{no}\), \(\textit{some}\), \(\textit{all}\)
\(\textit{most}\), \(\textit{many}\), \(\textit{few}\)
\(\textit{three}\), \(\textit{at least 4}\), \(\textit{between 8 and 12}\)
\(\textit{less than Jake drank}\), \(\textit{2 more than Jake could possibly drink}\)

logical semantics

"No \(A\) is \(B\)" is true iff there is no \(A\) that is also a \(B\).
"Some \(A\) is \(B\)" is true iff there is at least one \(A\) that is also a \(B\).
"All \(A\) are \(B\)" is true iff there is no \(A\) that is not also a \(B\).

test your intuitions

"None/some/all of the circles are black."

0balls 1balls 2balls 3balls

4balls 1balls 2balls 3balls

4balls 4balls 4balls

experimental data (preview)

truth-value judgements for "Some of the circles are black."

pragmatics

Herbert Paul Grice

life & work

March 13, 1913 - August 28, 1988
Oxford & Berkeley
natural language philosophy
- non-natural meaning
- implicature

grice

dummy

implicature

utterance meaning = semantic meaning + rational language use

Gricean pragmatics

Cooperative Principle

Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.

dummy

Maxim of Quantity

Make your contribution as informative as is required for the current purposes of the exchange.
Do not make your contribution more informative than is required.

dummy

Maxim of Relevance

Be relevant!

Grice 1975 "Logic and Conversation"

Gricean language use

When would a cooperative speaker say: "Some of the 10 circles are black"?

no. of black balls	probability of using "some"	salient alternative
0	very, very low	"none"
1	very low	"one"
2	low	"two"
3	meh	"three"
4-6	high	???
7-9	lower	"most"
10	low	"all"

upshot

The pragmatic felicity of a description \(m\) for a situation \(c\) is a measure of how adequate \(m\) is for a given purpose of talk relative to alternative descriptions.

upshot

pragmatic felicity depends on:

purpose of conversation
salience of alternatives

dummy

pramgatic felicity is an elusive notion

ideally, we'd like to have a formal, even quantitative notion

enter computational pragmatics

experiments

overview

replication/extension of previous work
- van Tiel (2014), Degen & Tanenhaus (2015)
4 experimental variants:
- binary truth-value judgements vs. 7-point rating scale
- include filler sentences with \(\textit{many}\) and \(\textit{most}\) or not
participants recruited via Amazon's Mechanical Turk

dummy

expTable

truth-value judgement task

binary

rating scale task

results

methodological puzzles

do binary and ordinal tasks measure the same thing?
- one is about truth, the other about "goodness"
is what either task measures influenced by presence/absence of alternatives?
how would we answer these questions with standard statistical techniques?

generalized linear model

recap: simple regression

data

head(cars)

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

dummy

model

\[\beta_0, \beta_1 \sim \text{Norm}(0, 1000)\] \[\sigma^2_{\epsilon} \sim \text{Unif}(0, 1000)\]

\[\mu_i = \beta_0 + \beta_1 x_i\] \[y_i \sim \text{Norm}(\mu_i, \sigma^2_{\epsilon})\]

generalized linear model

glm_scheme

common link & likelihood functions

type of \(y\)	(inverse) link function	likelihood function
metric	\(\mu = \eta\)	\(y \sim \text{Normal}(\mu, \sigma)\)
binary	\(\mu = \text{logistic}(\eta, \theta, \gamma) = (1 + \exp(-\gamma (\eta - \theta)))^{-1}\)	\(y \sim \text{Binomial}(\mu)\)
nominal	\(\mu_k = \text{soft-max}(\eta_k, \lambda) \propto \exp(\lambda \eta_k)\)	\(y \sim \text{Multinomial}(\vec{\mu})\)
ordinal	\(\mu_k = \text{threshold-Phi}(\eta_k, \sigma, \vec{\delta})\)	\(y \sim \text{Multinomial}(\vec{\mu})\)
count	\(\mu = \exp(\eta)\)	\(y \sim \text{Poisson}(\mu)\)

dummy

see Kruschke (2015), Chapter 15

logistic function

\[\text{logistic}(\eta, \theta, \gamma) = \frac{1}{(1 + \exp(-\gamma (\eta - \theta)))}\]

dummy

threshold \(\theta\)

gain \(\gamma\)

threshold-Phi model

threshPhi

dummy

Kruscke, 2015, Chapter 23

pragmatic felicity model

idea

quantitative notion of pragmatic felicity \(F\) replaces predictor \(\eta\)
- \(F\) is (function of) relative expected utility:
  - goodness of description (with "some") compared to alternative descriptions?
- data-driven approach to infer gradient salience of alternatives
\(F\) feeds into two link functions:
- logistic model for binary truth-value judgements
- thresholded-Phi model for rating scale judgements

full model

modelGraph

set up

conditions \(c \in \{0, \dots, 10\}\): number of black balls
degrees \(d \in \{1, \dots, 7\}\)
messages \(m \in M = \{\textit{none}, \textit{one}, \textit{two}, \textit{three}, \textit{many}, \textit{most}, \textit{all}, \textit{some}\}\)
semantics:

##       c=0 c=1 c=2 c=3 c=4 c=5 c=6 c=7 c=8 c=9 c=10
## none    1   0   0   0   0   0   0   0   0   0    0
## one     0   1   0   0   0   0   0   0   0   0    0
## two     0   0   1   0   0   0   0   0   0   0    0
## three   0   0   0   1   0   0   0   0   0   0    0
## many    0   0   0   0   0   1   1   1   1   1    1
## most    0   0   0   0   0   0   1   1   1   1    1
## all     0   0   0   0   0   0   0   0   0   0    1
## some    0   1   1   1   1   1   1   1   1   1    1

Gricean speakers

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(c \mid m) = \text{Uniform}(c \mid \{ c' \mid m \text{ is true in } c' \} ) \]

utility for true \(c\) and interpretation \(c'\):

\[ U(c, c' \ ; \ \pi) = \exp(- \pi \ (c - c')^2 ) \]

expected utility:

\[ \text{EU}(m, c \ ; \ \pi) = \sum_{c'} P_{LL}(c' \mid m) \ U(c, c' \ ; \ \pi) \]

Gricean speakers choose maximally informative/useful messages:

\[ m \in \arg \max_{m' \in M} \text{EU}(m', c \ ; \ \pi) \]

(c.f., Frank & Goodman, 2012, Science; Franke, 2014, Proceedings CogSci)

pragmatic felicity

scaled expected utility given set \(X\) of entertained alternatives:

\[ \text{EU}^*(c , X \ ; \ \pi) = \frac{\text{EU}(\textit{some}, c) - \min_{m \in X} \text{EU}(m, c)}{\max_{m \in X} \text{EU}(m, c) - \min_{m \in X} \text{EU}(m, c)} \]

salience of alternatives \(m \in M \setminus \{ \textit{some} \}\):

\[ s_m \sim \text{Beta}(1,1) \]

probability of entertaining \(X \subseteq M\) (crudely assume independence!):

\[ P(X \mid \vec{s}) = \prod_{m \in X} s_m \prod_{m \in M \setminus X} \ (1-s_m) \]

expected relative felicity:

\[ \text{F}(c \ ; \ \vec{s}, \pi) = \sum_X P(X \mid \vec{s}) \ \text{EU}^*(c , X \ ; \ \pi) \]

full model

modelGraph

results

MCMC set up

model implemented in JAGS (Plummer 2003)
10,000 samples after 10,000 burn-in steps (2 chains, every second sample used)
convergence checked visually and by \(\hat{R}\) (Gelman & Rubin 1992)

dummy

## 256 sets to create.

posteriors: link function parameters

logistic function

thresholded-Phi model

posteriors: model parameters

posteriors: salience

posteriors: salience differences

posteriors: pragmatic felicity

posteriors: felicity differences

posterior predictive checks

conclusions

idea that truth-value and rating-scale task measure the same thing is tenable
measure: scaled relative expected utitlity under variably salient alternatives
this is influenced by presence/absence of alternatives
theory-driven probabilistic modeling can advance methodological debate
important to make explicit link functions part of full data-generating model