overview

recap

  • 3 types of Bayesianism:
    • Bayesian versions of classical statistical tests
    • Bayesian modeling of data-generating process
    • cognitive models that assume that people "are Bayesian"

dummy

today

  • subjective beliefs for models of "Bayesian cognition"

what, where from, where to?

the "priors consortium"

  • Erin D. Bennett
  • Fabian Dablander
  • Judith Degen
  • Michael Franke
  • Noah Goodman
  • Justine Kao
  • Anthea Schoeller
  • Michael Henry Tessler
  • Ciyang Qing
  • … ??? …

dummy

dummy

disclaimer

  • this is mess in progress
  • all follies are mine

background

"Bayesian cognition"

Bayes for learning

  • prior beliefs on what "structure" can be expected
    • (possibly innate) learning biases
  • Bayes rule to learn from sparse input
    • language acquisition
    • conceptualization, generalization

Bayes for performance

  • model task behavior as (partly) Bayesian inference
    • rationalistic explanation, not necessarily mechanism/process
  • subject's beliefs about everyday events (task items) plays a role
    • reasoning from premisses, future predictions, etc.
    • language use and interpretation
      • vagueness, non-literal interpretation etc.

example: vague many

sentence

Joe eats many burgers.

cardinal surprise reading

Joe eats more bugers that we would expect of him.

theory

  • look at expectation \(p\) about burger consumption for the "kind of guy" that Joe is
    • \(p(n)\) is the probability that a guy like Joe eats \(n\) burgers
    • let \(P\) be the cumulative distribution of \(p\)
  • let \(b\) be the number of burgers that Joe actually eats
  • "Joe eats many burgers" is true iff \(P(b)\) is bigger than a fixed threshold
    • that threshold is the same for all sentences of this kind
    • contextual vagueness from uncertainty about \(p\)

(e.g., Schoeller & Franke 2015)

enter subjective beliefs

desideratum

  • compact, versatile respresentation of subjective beliefs
  • non-expert beliefs about possibly mundane affairs
  • cheap, fast and reliable ways of assessment

previous literature

  • take real-world frequencies as stand-in for subjective beliefs
  • measure beliefs experimentally, e.g.:
    • infer parameters of Gaussian from a judgements of cumulative probability (Manski, 2004)
    • infer parameters and type of distribution from predictions (Tauber & Steyvers 2013)
    • binned slider ratings (Kao et al., 2014)

binned slider ratings

manyPriors

  • discretize domain into bins; let subjects rate each bin
  • take relative measure: normalize slider ratings per participant
  • average (normalized) slider rating to reflect population-level belief
    • "wisdom of the crowds"

(e.g., Schoeller & Franke 2015)

agenda

issues

  • wanted: reliable & practical way of measuring beliefs
    • consistency across measures
  • relation subjects' answers, their beliefs and the population-level aggregate?
    • what of between-subject variance in task data?

dummy

approach

  • belief elicitation with different tasks (within-subject)
  • Bayesian hierarchical model to infer latent subjective & "population-level beliefs"
    • "population-level belief" => central tendency of subjective beliefs
    • think: "mean of a Gaussian hyperprior"

experiments

overview

  • 20 participants recruited via MTurk
  • each saw every condition of every task
  • 8 items (from previous research)
  • 3 task types:
    • slider ratings
    • number estimates
    • binary comparison (a.k.a. "lightning round")

items

  1. "X has just fetched himself a cup of coffee from the office vending machine."
    • "What do you think the temperature of his coffee is?"
  2. "X commuted to work yesterday."
    • "How many minutes do you think she spent commuting yesterday?"
  3. "X told a joke to N kids."
    • "How many of the kids do you think laughed?"
  4. "X bought a laptop."
    • "How much do you think it cost?"
  5. "X threw N marbles into a pool."
    • "How many of the marbles do you think sank?"
  6. "X just went to the movies to see a blockbuster."
    • "How many minutes long do you think the movie was?"
  7. "X watched TV last week."
    • "How many hours do you think he spent watching TV last week?"
  8. "X bought a watch."
    • "How much do you think it cost?"

slider ratings

dummy

priorsslider

number estimates

dummy

priorsnumbers

binary choices

dummy

priorslightning

results

slider ratings

dataslider

number estimates

dataslider

lightning round

dataslider

model

questions

general

  • is there a consistent population-level average that explains participants choices?
  • is the average slider rating a good approximation of a latent population-level average?
  • how much individual variance is there?

dummy

specific (technical)

  • hierarchical group-level prior for subjective beliefs?
  • link functions for task types?

slider data

\(s_{ijk} \in [0;1]\) - normalized slider ratings of subject \(i\) for item \(j\) and bin \(k\)

dim(y.slider)
## [1] 20  8 15
y.slider[1,1:4,1:5]
##             [,1]        [,2]       [,3]       [,4]       [,5]
## [1,] 0.025085519 0.022805017 0.02394527 0.02394527 0.05131129
## [2,] 0.001091703 0.004366812 0.05676856 0.05676856 0.08842795
## [3,] 0.001331558 0.034620506 0.03728362 0.03595206 0.03994674
## [4,] 0.104519774 0.139830508 0.13841808 0.10875706 0.09180791

number data

\(n_{ij} \in \{1, \dots, 15\}\) - number of bin for number choice of subject \(i\) and item \(j\)

y.number[1:12,]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
##  [1,]    7    7   11    3   15    8    3    4
##  [2,]   10    4    9    2   15    7    4    3
##  [3,]    7    3   12    3   15    9    4    5
##  [4,]    9    6   14    3   15    8    5    3
##  [5,]   10   11   13    2   15   11    4   11
##  [6,]   10    3   13    2   15    6    7    4
##  [7,]   10    6   13    2   15    8    6    2
##  [8,]   10    4   11    2   15    8    4    9
##  [9,]    7    9   12    3   15    8    4    5
## [10,]   13    5   12    1   15   10    7    1
## [11,]   10    3   11    2   15    8    5    3
## [12,]    6    3   12    7   15    6    9    3

lightning choice data

\(c_{ijl} \in \{ 0, 1\}\) - whether subject \(i\) chose the higher bin in lightning round \(l \in \{ 1, \dots, 5\}\) for item \(j\)

y.choice[1,,]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    0    0
## [2,]    1    1    0    0    0
## [3,]    1    1    1    1    1
## [4,]    1    0    0    0    0
## [5,]    1    1    1    1    1
## [6,]    1    1    1    0    0
## [7,]    1    1    0    0    0
## [8,]    1    0    0    0    0

model

modelGraph

hierarchical population prior

  • \(w \sim \text{Gamma}(2,0.1)\)
  • \(Q_{j} \sim \text{Dirichlet}(1,\dots, 1)\)
  • \(P_{ij} \sim \text{Dirichlet}(w Q_j)\)

dummy

w = 20

w = 200

link function: sliders

link function: numbers

link function: lightning

model

modelGraph

Bayesian inference

set-up

  • implemented in JAGS
  • 50,000 samples after a burn in of 100,000
  • convergence checks: visually and \(\hat{R}\)

posterior over parameters

postParameters

population-level beliefs \(Q_j\)

postPriors

red: averaged normal. slider ratings; black: mean posterior \(Q_i\) with 95% HDIs

individual vs. population-level beliefs

postSubjPriors

black: mean posterior \(Q_j\) with 95% HDIs; dark gray: mean posterior \(P_{ij}\)

impressions

from visual inspection

  • subjective beliefs differ from population-level mean (good!)
  • avrgd normlzd slider ratings nicely approximate mean \(Q_j\) (excellent!)

model criticism

PPC averaged normalized slider

ppcSlider

avrgd normlzd slider ratings we would expect from the model and the posterior distribution over parameters is virtually indistinguishable from the observed

PPC number choice

ppcNumber

some frequencies of number choices are suprising for the trained model (but: little data to go with; round or salient numbers may play a role)

PPC lightning round

ppcChoice

miserable failure to predict why it should be more likely that no marble sank than that one marble sank (alternative explanation: subjects revise beliefs, assume homogeneous "wonkiness" of marbles)

to do

  • posterior predictive checks for individual-level \(P_{ij}\)

  • posterior predictive p-value with test statistics:
    • entropy of normalized slider ratings per subject
    • mean of normalized slider ratings per subject

conclusions

conclusions

dummy

  • avrgd normlzd slider ratings appear to be practical and reliable measures

dummy

  • slider ratings closely track pop-level central tendency of individual beliefs

dummy

  • hierarchical modeling of population-level beliefs is possible

dummy

  • link functions for task choices seem reliable (enough)