Bayesian data analysis & cognitive modeling

recap

Bayes rule for data analysis:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]

Markov Chain Monte Carlo

sequence of representative samples from \(X \sim P\)

JAGS

declarative language to specify data-generating model
- model = prior + likelihood function

key notions

fun with JAGS models
- hierarchical models / latent mixture models
- ideas & notation
shrinkage
examples, examples, examples

first example

example (recap)

modelString = "
model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}"
# prepare for JAGS
dataList = list(k = 14, N = 20)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString), 
                       data = dataList,
                       n.chains = 2)
update(jagsModel, n.iter = 5000)
codaSamples = coda.samples(jagsModel, 
                           variable.names = c("theta"),
                           n.iter = 5000)

example (recap)

ms = ggs(codaSamples)
ggs_density(ms) + 
  stat_function(fun = function(x) dbeta(x, 15,7), color = "black")

graph notation

model{
  theta ~ dbeta(1,1)
  k ~ dbinom(theta, N)
}

dummy dummy

probMod

example (extended)

model{
  b ~ dgamma(1,1)
  a ~ dgamma(1,1)
  theta ~ dbeta(a,b)
  k ~ dbinom(theta, N)
}

dummy dummy

probMod

example (extended): (hyper-)priors

  ggplot(data.frame(x = c(0.0001,10)), aes(x)) +
         stat_function(fun = function(x) dgamma(x,1,1))

example (extended): posteriors

hierarchical models

latent variables can depend on other latent variables
likelihood can be a (direct) function of only some latent variables

dummy

\[ \begin{align*} P(\theta, a, b \, | \, D) & \propto P(\theta, a, b) \ P(D \, | \, \theta, a, b)\\ & = P(a) \ P(b) \ P(\theta \mid a, b) \ P(D \, | \, \theta) \end{align*} \]

dummy dummy

probMod

deterministic dependencies

probMod

\[ \begin{align*} P(\theta, s \, | \, D) & \propto P(\theta, s) \ P(D \, | \, \theta, s)\\ & = P(s) \ \underbrace{P(\theta \mid s)}_{\text{degenerate!}} \ P(D \, | \, \theta) \end{align*} \]

model{
  s ~ dbern(0.5)
  theta = ifelse(s == 1, 0.75, 0.25)
  k ~ dbinom(theta, N)
}

graph conventions

type of variable:
- continuous: circle
- discrete: square
information status:
- observed: grey shade
- latent: white
dependency:
- stochastic: single line
- deterministic: double line

Lee & Wagenmakers (2013)

ready, steady, go!

probMod

priors in hierarchical models

priors are complex objects
should make sense at every level in the hierarchy:
- check dependent priors of intermediate latent variables!

example (revistited)

model{
  b ~ dgamma(1,1)
  a ~ dgamma(1,1)
  theta ~ dbeta(a,b)
  k ~ dbinom(theta, N)
}

dummy dummy

probMod

example: priors

reparameterization

\(\text{Beta}(a,b)\) has mode \(\omega\) and concentration \(\kappa\): \[ \begin{align*} \omega & = \frac{a-1}{a+b-2} & \kappa &= a + b \end{align*} \]
reformulate: \[ \begin{align*} a & = \omega((\kappa - 2) + 1) & b & = (1-\omega)(\kappa-2)+1 \end{align*} \]

reparameterized model

model{
  omega ~ dbeta(1,1)
  kappaMin2 ~ dgamma(0.01,0.01)
  theta ~ dbeta(omega*kappaMin2+1, 
                (1-omega)*kappaMin2+1)
  k ~ dbinom(theta, N)
  kappa = kappaMin2+2
}

dummy dummy

probMod

reparameterized model: priors

reparameterized model: posteriors

latent mixture models

latent mixtures

hierarchical models with latent parameters:
- individual-level differences
- item-level differences
- …

therapeutic touches

dummy dummy

TT practitioners sense energy fields
they sense which hand is near another persons body

(Kruschke, chapter 9.2.4)

TT data

TTdata = read.csv('data/TherapeuticTouchData.csv') %>% 
         group_by(s) %>% 
         summarize(k = sum(y), N = length(y))
# k - number of successes per subject
# N - number of trials per subject
show(head(TTdata))

## Source: local data frame [6 x 3]
## 
##     s k  N
## 1 S01 1 10
## 2 S02 2 10
## 3 S03 3 10
## 4 S04 3 10
## 5 S05 3 10
## 6 S06 3 10

TTdata

ggplot(TTdata, aes(x = s, y = k/N)) + geom_histogram(stat='identity')

latent mixture model

model{
  omega ~ dbeta(1,1)
  kappaMin2 ~ dgamma(0.01,0.01)
  for (i in 1:28){
    theta[i] ~ dbeta(omega*kappaMin2+1, 
                (1-omega)*kappaMin2+1)
    k[i] ~ dbinom(theta[i], N[i])
  }
}

latent mixture model

latent mixture model: posteriors

show(ggs_density(ms, family = "omega|kappa"))

latent mixture model: posteriors

msTheta = filter(ms, ! Parameter %in% c("omega", "kappa"))
msTheta$subject = with(msTheta,
                       factor(regmatches(Parameter, regexpr("[0-9]+", Parameter)), 
                              levels = 1:28))  
ggplot(msTheta, aes(x = subject, y = value)) +geom_boxplot()

shrinkage

what's going on?

shrinkage vs. population-level generalization

"shrinkage" is a misleading term:
- shrinkage of variance in low-level parameters (compared to non-hierarchical)
- variance can also increase, depending on the particular model
hierarchical modeling "binds together" low-level parameters
- this is an additional assumption (can be good or bad)
hierarchical "binding" reduces "freedom" of free variables
- can guard us against overfitting
hierarchical group-level binding gives us population-level estimates
- generalize to new observations

double mixture classifier

two country quiz

Candidates from Thailand and Moldovia take part in a quiz. Questions are about the history of each country. Here's the data of who got which question right.

show(CountryQuizData)

##    QA QB QC QD QE QF QG QH
## P1  1  0  0  1  1  0  0  1
## P2  1  0  0  1  1  0  0  1
## P3  0  1  1  0  0  1  0  0
## P4  0  1  1  0  0  1  1  0
## P5  1  0  0  1  1  0  0  1
## P6  0  0  0  1  1  0  0  1
## P7  0  1  0  0  0  1  1  0
## P8  0  1  1  1  0  1  1  0

We want to infer:

which questions are (likely) about the history of the same country, and
which candidates are (likely) from the same country.

Lee & Wagenmakers (2013, Chapter 6.4)

double mixture model

probMod

double mixture model

model{
  alpha ~ dunif(0,1)    # chance correct if origin matches question
  beta ~ dunif(0,alpha) # mismatch   
  for (i in 1:dim(k)[1]){
    x[i] ~ dbern(0.5)
  }
  for (j in 1:dim(k)[2]){
    z[j] ~ dbern(0.5)
  }   
  for (i in 1:nx){
    for (j in 1:nz){
      theta[i,j] = ifelse(x[i] == z[i], alpha, beta)
      k[i,j] ~ dbern(theta[i,j])
    }
  }   
}

samples from prior

posteriors

posterior over types

## Source: local data frame [18 x 2]
## 
##    Parameter       mean
## 1      alpha 0.88056598
## 2       beta 0.06017908
## 3       x[1] 0.00000000
## 4       x[2] 0.00000000
## 5       x[3] 1.00000000
## 6       x[4] 1.00000000
## 7       x[5] 0.00000000
## 8       x[6] 0.00000000
## 9       x[7] 1.00000000
## 10      x[8] 1.00000000
## 11      z[1] 0.00000000
## 12      z[2] 1.00000000
## 13      z[3] 1.00000000
## 14      z[4] 0.00000000
## 15      z[5] 0.00000000
## 16      z[6] 1.00000000
## 17      z[7] 1.00000000
## 18      z[8] 0.00000000

summary

hierarchical models: chains of dependencies btw. variables
information from data percolates "upward":
- stronger impact on nearby nodes
- less impact on higher nodes (convergence checks!)
high-level nodes can unify estimates for lower-level variables
- sometimes called "shrinkage""
- better: group-level homogeneity assumption
complex data generating models:
- prior + likelihood
- our assumptions about how the data could have been generated

homework for next week

read Kruschke Chapter 10 in preparation
optional reading:
- Vandekerckhove et al. (2015)
- Wagenmakers et al. (2010)
finish homework 3