Bayesian data analysis: Homework set 4

Please read this: Name your file homework3_lastname_firstname. This set is due 12/16 before class. The submission should be in Rmarkdown, and should go to Fabian with subject BDA: Homework 4 (submit both .Rmd and .pdf). We encourage discussing exercises with fellow students, but only individual solutions will be accepted.

1. Estimating a model’s evidence by clever sampling (8)

Use the method of clever sampling introduced in the lecture to estimate the evidence of the exponential and the power model of forgetting curves, where we use an approximation of the posterior (using a Gamma distribution, just like in class) for aligning distribution \(h\). The models are as before with priors \(a,b,c,d \sim \text{Unif}(0,1.5)\). Let’s assume that the data is:

y = c(.93, .81, .39, .25, .24, .15)
t = c(  1,   3,   6,   9,  12,  18)
obs = y*100

Here’s what you need to do, step by step (once for each model):

Get the posteriors for the new data, using JAGS and the implementation of the models from the lecture.

To be concise, we can write both models as a hierarchical model where \(m\) is the model index. Conveniently, \(m\) will also indicate the posterior probability of – in our case – the exponential model. Thus we have a sanity check for our ‘clever sampling’.

library('rjags')
library('ggmcmc')

runmodel <- function(dat) {
  
  trans = '
  model{
    m ~ dbern(.5)
    a ~ dunif(0,1.5)
    b ~ dunif(0,1.5)
    c ~ dunif(0,1.5)
    d ~ dunif(0,1.5)
    for (i in 1: 6){
      pT[i] = ifelse(m == 1, a*exp(-t[i]*b), c*t[i]^(-d))
      p[i] = min(max(pT[i], 0.0001), 0.9999)
      obs[i] ~ dbinom(p[i], 100)
    }
  }'
  
  params = c('m', 'a', 'b', 'c', 'd')
  model = jags.model(textConnection(trans), data = dat, n.chains = 3, quiet = TRUE)
  update(model, 5000)
  samples = coda.samples(model, variable.names = params, n.iter = 50000)
  samples
}

We note something interesting here. The exponential model dominates completely! In fact, our hierarchical model never spends time estimating the power model; this means that the posterior for parameters in the power model are simply the prior.

dat = list('t' = t, 'obs' = obs)
ss.exp = runmodel(dat)
ggs_density(ggs(ss.exp)) # density estimation of m is pretty useless

Thus in order to test our ‘clever sampling’, we will need to write code estimating the power model seperately … but we don’t have to! We simple pass \(m = 0\) into our data list.

dat = list('t' = t, 'obs' = obs, 'm' = 0)
ss.pow = runmodel(dat)
ggs_density(ggs(ss.pow))

Find Gamma-approximations for the posterior, using function getGammaApprox from the lecture. Notice that you have to massage the samples into the right format (as done in the lecture, where the function getGammaApprox was introduced.)

getGammaApprox = function(samples){
  s = sd(samples)
  m = mean(samples)
  ra = (m + sqrt( m^2 + 4*s^2)) / (2 * s^2)
  sh = 1 + m * ra
  c(shape = sh, rate = ra)
}

# join the chains
ss.exp = do.call('rbind', ss.exp[, c(1, 2)]) # join a and b
ss.pow = do.call('rbind', ss.pow[, c(3, 4)]) # join c and d

gamma.exp = rbind(getGammaApprox(ss.exp[, 1]), getGammaApprox(ss.exp[, 2]))
gamma.pow = rbind(getGammaApprox(ss.pow[, 1]), getGammaApprox(ss.pow[, 2]))

rownames(gamma.exp) = c('a', 'b')
rownames(gamma.pow) = c('c', 'd')

gamma.exp

##       shape     rate
## a 1178.8719 1111.123
## b  199.4524 1514.751

gamma.pow

##       shape      rate
## c 2189.3657 2326.1547
## d  232.4501  468.4177

Use enough JAGS samples from the posterior distribution to calculate, where \(h\) is approximated by your Gamma distributions:

\[ \begin{align*} \frac{1}{D} & \approx \frac{1}{n} \sum^{n}_{\theta_i \sim P(\theta \mid D)} \frac{h(\theta)}{P(D \mid \theta) P(\theta)} \end{align*} \]

We do not care about the prior because, due to Bayes updating, the posterior has (as an upper bound) the same support as the prior (every likelihood value outside of the support of the prior gets multiplied with 0). Since the prior is uniform, all values get multiplied with the same value (in our case \(1 / 1.5\)), so we can just drop it, for the purposes of computing the Bayes factor.

h = function(ss, gam) {
  (dgamma(ss[, 1], shape = gam[1, 1], rate = gam[1, 2]) * # e.g. parameter a
   dgamma(ss[, 2], shape = gam[2, 1], rate = gam[2, 2]))  # e.g. parameter b
}

LL = function(expr, w1, w2, t = dat$t) {
  p = eval(expr)
  p[p <= 0.0] = 1.0e-5
  p[p >= 1.0] = 1-1.0e-5
  # exp-sum-log trick for numerical stability
  exp(sum(dbinom(x = obs, prob = p, size = 100, log = TRUE)))
}

LL_vec = Vectorize(LL)
powLL = expression(w1 * t^(-w2))
expLL = expression(w1 * exp(-w2 * t))

pow_evidence = mean(h(ss.pow, gamma.pow) / (LL_vec(powLL, ss.pow[, 1], ss.pow[, 2])))
exp_evidence = mean(h(ss.exp, gamma.exp) / (LL_vec(expLL, ss.exp[, 1], ss.exp[, 2])))

Calculate the Bayes factor from the estimated evidences and interpret the result.

(BF = pow_evidence / exp_evidence) # in favour of exp model (because we computed inverse of evidence)

## [1] 49451.61

Assuming equal prior odds, these are our posterior odds in favour of the exponential model. We can convert posterior odds into probabilities, which is what our estimate of \(m\) before was.

BF / (1 + BF) # very similar to our estimate m = 1

## [1] 0.9999798

2. Transdimensional MCMC (6)

Our data are, as before, \(k=7\) and \(N=24\). Look at the two models that we considered in class for this case:

\(M_1\) has \(\theta \sim \text{Beta}(1,1)\) and \(k \sim \text{Binomial}(\theta, N)\)
\(M_0\) has \(\theta = 0.5\) and \(k \sim \text{Binomial}(0.5, N)\)

Use transdimensional MCMC to compare the models. Compute the posterior model odds, given equal priors for each model. We did calculate the Bayes factor in class, so you should be able to verify your results.

ms = '
model {
  m ~ dbern(.5)
  theta ~ dbeta(1, 1)
  p = ifelse(m == 0, .5, theta) # if 0 => sample M_0, if 1 => sample M_1
  k ~ dbinom(p, N)
}'

datt = list('k' = 7, 'N' = 24)
model = jags.model(textConnection(ms), data = datt, n.chains = 3, quiet = TRUE)
update(model, 5000)
samples = coda.samples(model, variable.names = 'm', n.iter = 100000)

m = mean(do.call('rbind', samples))
m # posterior probability of M1

## [1] 0.6613067

((1 - m) / m) # Bayes factor in favour of M0

## [1] 0.5121577

3. A night at the club (4)

Jack & Jill debate whether they should enter club Whatever tonight. They only want to go if the women/men ratio is about even. Otherwise the sphere just isn’t, like, right, you know. Before they decide to pay, they screen the crowd that is lining up at the entrance. It is now 1:30 and there’s 28 people in line, of which 9 female, 16 men, and three unidentifiable. How should they test whether the women/men ratio is even in this particular case? Should they use estimation, model comparison or \(p\)-value testing? If you cannot decide, discuss pros and cons for each. (Think about how they should construe a likelihood function, whether a reasonable prior is at hand, whether the sampling procedure is reasonably clear, whether model complexity plays a role, etc. Suppose for the sake of argument, that computational complexity is not a problem: Jack & Jill have brought their laptops, of course.)

I would vote against using \(p\) values here, because it is not clear what the sampling plan is. Model comparison seems more reasonable. For ease of exposure, we would use a Binomial model, \(N = 25\), \(y = 9\); ignoring the missing values for now. Their favoured model is \(\theta = .5\). Let’s discuss the estimation approach first.

I would say that the ratio of women to men in a club usually is \(1:2\), which means our prior peaks at \(\omega = 1/3\). I am somewhat unsure about this belief, so I specify \(\kappa = 10\). Remember that this is an alternative specification of the Beta distribution, with \(\omega\) being the mode, and \(\kappa\) the sample size

\[ \begin{align*} \omega &= \frac{a - 1}{a + b - 2} \\ \kappa &= a + b \end{align*} \]

giving us \(a = \omega (\kappa - 2) + 1\) and \(b = \kappa - \omega (\kappa - 2) -1 = (1 - \omega)(\kappa - 2) + 1\). Bayesian updating yields the following

dbeta2 <- function(x, w, k, y = 0, N = 0) {
  a = w * (k - 2) + 1
  b = (1 - w) * (k - 2) + 1
  dbeta(x, a + y, b + N - y)
}

plot(function(x) dbeta2(x, 1/3, 10, 9, 25), main = 'Bayesian updating',
     col = 'darkorchid1', lwd = 2, ylab = 'density') # posterior

plot(function(x) dbeta2(x, 1/3, 10),
     col = 'skyblue', lwd = 2, lty = 2, add = TRUE) # prior

plot(function(x) dbeta(x, 9, 16), col = 'darkorange',
     lwd = 2, lty = 3, add = TRUE) # likelihood

legend('topright', c('Prior', 'Likelihood', 'Posterior'),
       col = c('skyblue', 'darkorange', 'darkorchid1'),
       lty = c(2, 3, 1), bty = 'n', lwd = c(2, 2, 2))

The 95% HDI of our posterior can be calculated with the help of function HDIofICDF from Kruschke’s DBDA2E-utilities.R.

a = 12.66
b = 22.33

 # because posterior is asymmetric
HDIofICDF(qbeta, shape1 = a, shape2 = b)

## [1] 0.2084699 0.5194886

which seems to indicate that there are more men in the club.

Another approach is to compare two models, \(M_0: \theta = .5\) and \(M_1: \theta \sim \mathcal{Beta}(3.66, 6.33)\). Using Savage Dickey we get

(BF01 = dbeta(.5, 12.66, 22.33) / dbeta(.5, 3.66, 6.33)) # using a and b

## [1] 0.7054403

in favour of the null model. However, this is just the evidence brought by the data. To yield a conclusion, we have to factor our prior odds in. For me, the model positing an equal proportion of men and women is pretty unlikely. Certainly, this specific point hypothesis is almost always wrong; but Jack and Jill said ‘about even’, so let’s cut them some slack and go with \(1/5\) odds against equality

PM01 = BF01 * 1/5
PM01

## [1] 0.1410881

So our posterior odds for equality are somewhat bleak. However, it seems clear that the point null hypothesis here is rather awkward. Instead, we could have used an interval null hypothesis; that is, we also compute an integral for the null model, with the limits being choosen by our feeling of ‘about even’. Say

\[ BF_{01} = \frac{\int_{.4}^{.6} p(y|\theta, M_1)p(\theta|M_1)\mathrm{d}\theta}{\int_0^1 p(y|\theta, M_2)p(\theta|M_2)\mathrm{d}\theta} \]

which, using Monte Carlo methods, yields

n <- 100000

prior <- rbeta(n, 3.66, 6.33)
selected <- prior[prior >= .4 & prior <= .6] # integral boundaries
mean(dbinom(9, 25, selected)) / mean(dbinom(9, 25, prior))

## [1] 0.9450132

The Bayes factor is higher compared to the point null hypothesis, and I feel more comfortable specifying equal prior odds. Still, the posterior odds would favour the alternative model (slightly).

Side Note. Above we simply ignored the missing values. There is a very active research field concerning itself with estimating likely values for the NAs (see e.g. here). While Jack and Jill are too lazy to read up on the details, they instead use their model and predict likely values for the NAs.

library('rjags')

ms = '
model {
  theta ~ dbeta(3.66, 6.33)
  for (i in 1:28) {
    x[i] ~ dbern(theta)
    xpred[i] ~ dbern(theta)
  }
}'

x = c(rep(1, 9), rep(0, 16), rep(NA, 3))
model = jags.model(textConnection(ms), data = list('x' = x), quiet = TRUE)
samples = coda.samples(model, variable.names = c('theta', 'xpred'), n.iter = 10000)

samples = as.matrix(samples)
missing = samples[, 27:29]
apply(missing, 2, mean)

## xpred[26] xpred[27] xpred[28] 
##    0.3615    0.3621    0.3586

which indicates that, by excluding the three unrecognizable people, their model is rather conservative.

Wrap up. The estimation approach does not have this problem associated with point null hypotheses. Finally, we have to note that this is a decision problem. Merely solving the inference problem is not enough. Jack and Jill have to specify a loss function, specifying the penalty of (i) going into the club and finding out that the women/men ratio is not 1, (ii) going into the club and finding out the ratio is 1.