Bayesian data analysis & cognitive modeling

overview

recap

3 types of Bayesianism:
- Bayesian versions of classical statistical tests
- Bayesian modeling of data-generating process
- cognitive models that assume that people "are Bayesian"

dummy

today

subjective beliefs for models of "Bayesian cognition"

what, where from, where to?

the "priors consortium"

Erin D. Bennett
Fabian Dablander
Judith Degen
Michael Franke
Noah Goodman

Justine Kao
Anthea Schoeller
Michael Henry Tessler
Ciyang Qing
… ??? …

dummy

disclaimer

this is mess in progress
all follies are mine

background

"Bayesian cognition"

Bayes for learning

prior beliefs on what "structure" can be expected
- (possibly innate) learning biases
Bayes rule to learn from sparse input
- language acquisition
- conceptualization, generalization

Bayes for performance

model task behavior as (partly) Bayesian inference
- rationalistic explanation, not necessarily mechanism/process
subject's beliefs about everyday events (task items) plays a role
- reasoning from premisses, future predictions, etc.
- language use and interpretation
  - vagueness, non-literal interpretation etc.

example: vague many

sentence

Joe eats many burgers.

cardinal surprise reading

Joe eats more bugers that we would expect of him.

theory

look at expectation $p$ about burger consumption for the "kind of guy" that Joe is
- $p(n)$ is the probability that a guy like Joe eats $n$ burgers
- let $P$ be the cumulative distribution of $p$
let $b$ be the number of burgers that Joe actually eats
"Joe eats many burgers" is true iff $P(b)$ is bigger than a fixed threshold
- that threshold is the same for all sentences of this kind
- contextual vagueness from uncertainty about $p$

(e.g., Schoeller & Franke 2015)

enter subjective beliefs

desideratum

compact, versatile respresentation of subjective beliefs
non-expert beliefs about possibly mundane affairs
cheap, fast and reliable ways of assessment

previous literature

take real-world frequencies as stand-in for subjective beliefs
measure beliefs experimentally, e.g.:
- infer parameters of Gaussian from a judgements of cumulative probability (Manski, 2004)
- infer parameters and type of distribution from predictions (Tauber & Steyvers 2013)
- binned slider ratings (Kao et al., 2014)

binned slider ratings

manyPriors

discretize domain into bins; let subjects rate each bin
take relative measure: normalize slider ratings per participant
average (normalized) slider rating to reflect population-level belief
- "wisdom of the crowds"

(e.g., Schoeller & Franke 2015)

agenda

issues

wanted: reliable & practical way of measuring beliefs
- consistency across measures
relation subjects' answers, their beliefs and the population-level aggregate?
- what of between-subject variance in task data?

dummy

approach

belief elicitation with different tasks (within-subject)
Bayesian hierarchical model to infer latent subjective & "population-level beliefs"
- "population-level belief" => central tendency of subjective beliefs
- think: "mean of a Gaussian hyperprior"

experiments

overview

20 participants recruited via MTurk
each saw every condition of every task
8 items (from previous research)
3 task types:
- slider ratings
- number estimates
- binary comparison (a.k.a. "lightning round")

items

"X has just fetched himself a cup of coffee from the office vending machine."
- "What do you think the temperature of his coffee is?"
"X commuted to work yesterday."
- "How many minutes do you think she spent commuting yesterday?"
"X told a joke to N kids."
- "How many of the kids do you think laughed?"
"X bought a laptop."
- "How much do you think it cost?"
"X threw N marbles into a pool."
- "How many of the marbles do you think sank?"
"X just went to the movies to see a blockbuster."
- "How many minutes long do you think the movie was?"
"X watched TV last week."
- "How many hours do you think he spent watching TV last week?"
"X bought a watch."
- "How much do you think it cost?"

slider ratings

dummy

priorsslider

number estimates

dummy

priorsnumbers

binary choices

dummy

priorslightning

results

slider ratings

dataslider

number estimates

dataslider

lightning round

dataslider

model

questions

general

is there a consistent population-level average that explains participants choices?
is the average slider rating a good approximation of a latent population-level average?
how much individual variance is there?

dummy

specific (technical)

hierarchical group-level prior for subjective beliefs?
link functions for task types?

slider data

$s_{ijk} \in [0;1]$ - normalized slider ratings of subject $i$ for item $j$ and bin $k$

dim(y.slider)

## [1] 20  8 15

y.slider[1,1:4,1:5]

##             [,1]        [,2]       [,3]       [,4]       [,5]
## [1,] 0.025085519 0.022805017 0.02394527 0.02394527 0.05131129
## [2,] 0.001091703 0.004366812 0.05676856 0.05676856 0.08842795
## [3,] 0.001331558 0.034620506 0.03728362 0.03595206 0.03994674
## [4,] 0.104519774 0.139830508 0.13841808 0.10875706 0.09180791

number data

$n_{ij} \in \{1, \dots, 15\}$ - number of bin for number choice of subject $i$ and item $j$

y.number[1:12,]

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
##  [1,]    7    7   11    3   15    8    3    4
##  [2,]   10    4    9    2   15    7    4    3
##  [3,]    7    3   12    3   15    9    4    5
##  [4,]    9    6   14    3   15    8    5    3
##  [5,]   10   11   13    2   15   11    4   11
##  [6,]   10    3   13    2   15    6    7    4
##  [7,]   10    6   13    2   15    8    6    2
##  [8,]   10    4   11    2   15    8    4    9
##  [9,]    7    9   12    3   15    8    4    5
## [10,]   13    5   12    1   15   10    7    1
## [11,]   10    3   11    2   15    8    5    3
## [12,]    6    3   12    7   15    6    9    3

lightning choice data

$c_{ijl} \in \{ 0, 1\}$ - whether subject $i$ chose the higher bin in lightning round $l \in \{ 1, \dots, 5\}$ for item $j$

y.choice[1,,]

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    0    0
## [2,]    1    1    0    0    0
## [3,]    1    1    1    1    1
## [4,]    1    0    0    0    0
## [5,]    1    1    1    1    1
## [6,]    1    1    1    0    0
## [7,]    1    1    0    0    0
## [8,]    1    0    0    0    0

model

modelGraph

hierarchical population prior

$w \sim \text{Gamma}(2,0.1)$
$Q_{j} \sim \text{Dirichlet}(1,\dots, 1)$
$P_{ij} \sim \text{Dirichlet}(w Q_j)$

dummy

w = 20

w = 200

link function: sliders

$\kappa \sim \text{Gamma}(5,5)$
$\sigma \sim \text{Gamma}(0.0001, 0.0001)$
$s_{ijk} \sim \text{logistic}(\text{Norm}(\text{logit}(P_{ijk}), \sigma), \kappa)$

dummy

kappa = 1.2, sigma = 0.5

kappa = 2, sigma = 1.5

link function: numbers

$a \sim \text{Gamma}(2,1)$
$n_{ij} \sim \text{Categorical}(\text{exp}(a P_{ij}))$

dummy

qplot(sample(x = 5, size = 1000, replace = TRUE, prob = exp(a *(1:5))))

a = 0.5

a = 1.5

link function: lightning

$b \sim \text{Gamma}(2,1)$
$c_{ijl} \sim \text{Bern}(\text{exp}(b \ (p^{\text{high}}_{ijl}, p^{\text{low}}_{ijl}))$

\[p^\text{high}_{ijl} = \begin{cases} 2 & \text{if $mode(P_{ij})$ is closer to higher} \\ & \text{ bin of $l$ than to lower bin } \\ 1 & \text{if equal distance} \\ 0 & \text{otherwise} \end{cases}\]

\[p^\text{low}_{ijl} = 2 - p^\text{high}_{ijk}\]

example

what's more likely: John scored 10-12, or John scored 26-28 points?
subjective belief says: most likely John scored 14-16
lower bin is closer, so

model

modelGraph

Bayesian inference

set-up

implemented in JAGS
50,000 samples after a burn in of 100,000
convergence checks: visually and $\hat{R}$

posterior over parameters

postParameters

population-level beliefs $Q_j$

postPriors

red: averaged normal. slider ratings; black: mean posterior $Q_i$ with 95% HDIs

individual vs. population-level beliefs

black: mean posterior $Q_j$ with 95% HDIs; dark gray: mean posterior $P_{ij}$

impressions

from visual inspection

subjective beliefs differ from population-level mean (good!)
avrgd normlzd slider ratings nicely approximate mean $Q_j$ (excellent!)

model criticism

PPC averaged normalized slider

ppcSlider

avrgd normlzd slider ratings we would expect from the model and the posterior distribution over parameters is virtually indistinguishable from the observed

PPC number choice

ppcNumber

some frequencies of number choices are suprising for the trained model (but: little data to go with; round or salient numbers may play a role)

PPC lightning round

ppcChoice

miserable failure to predict why it should be more likely that no marble sank than that one marble sank (alternative explanation: subjects revise beliefs, assume homogeneous "wonkiness" of marbles)

to do

posterior predictive checks for individual-level $P_{ij}$
posterior predictive p-value with test statistics:
- entropy of normalized slider ratings per subject
- mean of normalized slider ratings per subject

conclusions

dummy

avrgd normlzd slider ratings appear to be practical and reliable measures

dummy

slider ratings closely track pop-level central tendency of individual beliefs

dummy

hierarchical modeling of population-level beliefs is possible

dummy

link functions for task choices seem reliable (enough)

overview

what, where from, where to?

background

"Bayesian cognition"

example: vague many

enter subjective beliefs

binned slider ratings

agenda

experiments

overview

items

slider ratings

number estimates

binary choices

results

slider ratings

number estimates

lightning round

model

questions

slider data

number data

lightning choice data

model

hierarchical population prior

link function: sliders

link function: numbers

link function: lightning

model

Bayesian inference

set-up

posterior over parameters

population-level beliefs \(Q_j\)

individual vs. population-level beliefs

impressions

model criticism

PPC averaged normalized slider

PPC number choice

PPC lightning round

to do

conclusions

conclusions