Please read this: Name your file homework3_lastname_firstname. This set is due 12/2 before class. The submission should be in Rmarkdown, and should go to Fabian with subject BDA: Homework 3 (submit both .Rmd and .pdf). We encourage discussing exercises with fellow students, but only individual solutions will be accepted.
Check out the following code.
require('rjags')
require('ggmcmc')
require('reshape2')
set.seed(1638)
coolData = data.frame(A = rnorm(100, mean = 100, sd = 10),
B = rnorm(100, mean = 110, sd = 15))
coolData = melt(coolData, variable.name = "Drug", value.name = "HeartRate")
# specify model
modelString = "
model{
# priors
muA ~ dunif(30,200)
muB ~ dunif(30,200)
varA ~ dunif(0, 1000)
varB ~ dunif(0, 1000)
tauA = 1/varA
tauB = 1/varB
# likelihood
for (i in 1:200){
HeartRate[i] ~ dnorm(ifelse(Drug[i] == 0, muA, muB),
ifelse(Drug[i] == 0, tauA, tauB))
}
muDiff = muB - muA
varDiff = varB - varA
}
"
# prepare for JAGS
dataList = list(Drug = ifelse(coolData$Drug == "A", 0, 1),
HeartRate = coolData$HeartRate)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString), data = dataList, n.chains = 3, quiet = TRUE)
codaSamples = coda.samples(jagsModel, variable.names = c("muDiff", "varDiff"), n.iter = 10000)
Before we even start, let us first plot the raw data to get a better grip of what is going on. This example is instructive, in that it points out the shortcomings of bar graphs, see also Weissgerber et al. (2015). Data visualization is extremely important.
ggplot(coolData, aes(x = Drug, y = HeartRate)) +
geom_bar(stat = "summary", fun.y = mean) +
stat_summary(fun.y = mean_cl_boot, fun.ymin = min,
fun.ymax = max, colour = "red",
geom = "errorbar", width = .2) # rather uninformative plot
ggplot(coolData, aes(x = Drug, y = HeartRate)) +
geom_violin(aes(fill = Drug)) + geom_jitter() # great plot!
Based on the plots above, we immediately see that the biggest difference between the two groups is in dispersion. Concretely, Drug B seems to exhibit higher variance in heartrate than Drug A. It looks like Drug B also increases the mean heartrate more strongly, but given the high variability this is hard to pin down. The above JAGS code makes these intuitive statements more precise. What we attempt here is the solve a problem of parameter estimation. What variance and mean values are credible, after we see this data? Of particular interest are the differences in both mean and variance. Assuming that both are normally distributed, we specify non-informative priors (taking biologically unplausible values – we’re leaving money on the table!).
summary(codaSamples)
##
## Iterations = 1001:11000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## muDiff 7.607 1.791 0.01034 0.01301
## varDiff 150.136 37.041 0.21386 0.29266
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## muDiff 4.064 6.408 7.623 8.81 11.11
## varDiff 85.377 124.056 147.523 173.05 230.66
ss = ggs(codaSamples)
ggs_density(ss)
ggs_autocorrelation(ss)
gelman.diag(codaSamples)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## muDiff 1 1
## varDiff 1 1
##
## Multivariate psrf
##
## 1
effectiveSize(codaSamples)
## muDiff varDiff
## 18996.43 16044.75
To get a better grip of the analysis problem, I have computed 95% credible intervals below, and included a standardized difference measure. Based on them, we can credibly believe that the mean effect and the variance of Drug B is higher than that of Drug A. In effect, the drug seems to increase the mean heartrate – but it might have quite different effects on the individual participants, thus creating high variability. However, note that we are being unprincipled here: estimating parameters is not testing hypotheses. If we were to specifically ask: Does Drug B increase heartrate compared to Drug A? Does Drug B further increase dispersion around the mean compared to Drug A? If we were interested in that, we would need to specify a prior distribution over this difference, and then, for each model, marginalize out its parameters. The ratio of this marginal likelihood yields the Bayes factor, and provides us with a principled mechanism of answering the hypothesis testing question.
combined = do.call('rbind', codaSamples)
combined = cbind(combined, combined[, 1] / sqrt(combined[, 2])) # standardize difference
coda::HPDinterval(coda::mcmc(combined)) # combine all samples and return 95% HDI
## lower upper
## muDiff 4.1582310 11.1878245
## varDiff 80.2233850 223.5609278
## 0.3079799 0.9749059
## attr(,"Probability")
## [1] 0.95
We will work with data about the speed of a car and the distance needed to stop.
# install.packages('car') # if you don't have this package yet
require('car')
head(cars) # this is the data set we work with
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
It is intuitive that there should be some kind of functional dependency between the speed that a car travels at and the distance it takes for it to come to a halt when breaking. Indeed, we can verify this by plotting ‘distance’ against ‘speed’ and then adding a simple linear regression line, obtained here with the R function lm.
We could get this regression line directly from ggplot, but then we might not realize that this is actually a regression analysis. We also want to know the values of slope and intercept of the regression line later on.
coeff = coef(lm(formula = dist ~ speed, data = cars)) # get slope & intercept of linear model
qplot(cars[,'speed'], cars[,'dist']) + xlab('speed') + ylab('distance') + theme_bw() +
geom_abline(intercept = coeff[1], slope = coeff[2], color = "skyblue")
We are going to write a simple Bayesian linear estimator for ‘distance’ as a function of ‘speed’. Concretely, we want to implement the following Bayesian model, where \(x_{i}\) is the \(i\)-th entry in the data frame for variable speed and \(y_{i}\) is the \(i\)-th entry for variable distance:
\[\beta_0 \sim \text{Norm}(0, 1000)\] \[\beta_1 \sim \text{Norm}(0, 1000)\] \[\sigma^2_{\epsilon} \sim \text{Unif}(0, 1000)\]
\[\mu_i = \beta_0 + \beta_1 x_i\] \[y_i \sim \text{Norm}(\mu_i, \sigma^2_{\epsilon})\]
To explain, \(\beta_1\) is the slope, \(\beta_0\) is the intercept. We assume that these give us a mean \(\mu_i\) for each \(x_i\). The value \(y_i\) is then assumed to be normally distributed around this \(\mu_i\).
# specify model
modelString2 = "
model{
# priors
var_e ~ dunif(0, 1000)
tau_e = 1/var_e
b0 ~ dnorm(0, 1/1000)
b1 ~ dnorm(0, 1/1000)
for (i in 1:50){
mu[i] = b0 + b1 * speed[i]
dist[i] ~ dnorm(mu[i], tau_e)
}
}
"
# prepare for JAGS
dataList = as.list(cars)
# set up and run model
params <- c('b0', 'b1', 'var_e')
jagsModel = jags.model(file = textConnection(modelString2), data = dataList, n.chains = 3, quiet = TRUE)
update(jagsModel, 5000) # 5000 burn-in samples
codaSamples = coda.samples(jagsModel, variable.names = params, n.iter = 20000)
ms = ggs(codaSamples)
ggs_autocorrelation(ms)
ggs_traceplot(ms)
gelman.diag(codaSamples)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## b0 1 1
## b1 1 1
## var_e 1 1
##
## Multivariate psrf
##
## 1
effectiveSize(codaSamples)
## b0 b1 var_e
## 3541.576 3534.060 23705.202
All convergence checks give satisfying indication that our chains have converged to the stationary distribution (the posterior distribution). However, the autocorrelation for our predictors is quite high, and the effective sample size thus quite low.
ggs_density(ms)
summary(codaSamples)
##
## Iterations = 6001:26000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 20000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## b0 -16.67 6.793 0.02773 0.114219
## b1 3.88 0.419 0.00171 0.007057
## var_e 257.67 55.902 0.22822 0.364124
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## b0 -29.943 -21.198 -16.690 -12.16 -3.260
## b1 3.054 3.599 3.881 4.16 4.695
## var_e 170.296 218.008 250.069 289.24 386.707
library('broom')
m = lm(dist ~ speed, data = cars)
tidy(m)
## term estimate std.error statistic p.value
## 1 (Intercept) -17.579095 6.7584402 -2.601058 1.231882e-02
## 2 speed 3.932409 0.4155128 9.463990 1.489836e-12
It seems that the Bayesian analysis and the frequentist analysis give similar results. Their differ, however, because our prior specification is not quite so uninformative. If we were to increase the variance of the predictors by a factor of 10, the mean of the posterior and the classical estimate would agree. However, if we have prior knowledge, we would be silly not to include it. Including it would increase prediction accuracy.
We get a whole posterior distribution quantifying our uncertainty with respect to the parameter, instead of a point estimate and a confidence interval. We can answer several inference questions that are unanswerable within the frequentist paradigm (see here).
We introduced a latent-mixture model for the therapeutic touch data in class (following Kruschke). Here, we would like to explore a related model, which assumes that there are three types of touchers: (i) suckers, (ii) guessers, (iii) sensors. Suckers have a low chance of success, guessers operate at around chance level, and sensors have an above average chance. We would like to use the data to infer the latent touch-type of each subject.
More concretely, for each subject i
we are going to infer a latent type type[i]
. The prior for type type[i]
is an unbiased categorical distribution (each type is equally likely): type[i] ~ dcat(c(1/3, 1/3, 1/3))
. Touch-types come from three disinct populations, each with its own mode, for \(j \in \{1,2,3\}\) the type of toucher, omega[j]
of a beta distribution for the success chance theta
. Suckers have omega[1] = 0.25
, guessers have omega[2] = 0.5
and sensors have omega[3] = 0.75
. The type of a subject thus determines the mode of a beta distribution: if i
is of type type[i]
then the mode of the beta distribution from which i
’s theta[i]
is drawn is omega[type[i]]
. We assume that there is a single concentration kappa
that is shared among all three type-populations. Given mode (for each subject) and concentration parameter, we determine a theta[i]
(for each subject) as in the previous touch-model from class. The theta[i]
feeds into a binomial likelihood function as before.
TherapeuticTouchData.csv
, which is part of DBDA2Eprograms.zip
, which is in our Dropbox in folder Kruschke_2015
. Reformat it just like we did in class.TTdata = read.csv('TherapeuticTouchData.csv') %>%
group_by(s) %>% dplyr::summarize(k = sum(y), N = length(y))
modelString = "
model{
omega[1] = 0.25 # suckers
omega[2] = 0.5 # guessers
omega[3] = 0.75 # sensors
kappaMin2 ~ dgamma(0.01,0.01)
kappa = kappaMin2+2
for (i in 1:28){
type[i] ~ dcat(c(1/3,1/3,1/3))
a[i] = omega[type[i]]*kappaMin2+1
b[i] = (1-omega[type[i]])*kappaMin2+1
theta[i] ~ dbeta(a[i],b[i])
k[i] ~ dbinom(theta[i], N[i])
}
}"
# prepare for JAGS
dataList = as.list(TTdata)
# set up and run model
jagsModel = jags.model(file = textConnection(modelString),
data = dataList, n.chains = 2, quiet = TRUE)
update(jagsModel, 25000)
codaSamples = coda.samples(jagsModel, n.iter = 50000,
variable.names = c("theta","type", "kappa"))
ggplot(TTdata, aes(x = s, y = k/N)) + geom_histogram(stat='identity') # raw data
ms = ggs(codaSamples)
msType = ms %>% filter(substr(x = Parameter, start = 1, stop = 4) == "type") %>%
mutate(subject = factor(regmatches(Parameter, regexpr("[0-9]+", Parameter)),
levels = 1:28))
ggplot(msType, aes(x = subject, fill = factor(value))) + geom_bar()
I think this is pretty cool. The posterior probabilities of belonging to the individual groups (suckers, guessers, sensors) does make sense, given the plot of the raw data. Specifically, the higher the amount of correct answers the therapeutic toucher makes, the higher the probability is that he belongs to the “sensors”.
I wonder if the individual therapeutic touchers would object to hierarchical modeling. It is true that, when estimating multiple means (at least more than 3, to be precise), that hierarchical modeling is more efficient than simply taking the mean (this is called Stein’s paradox, e.g. Efron & Morris, 1977); however, similar to research in psychophysics, we are primary interested in the individual. More self-servingly, the “sensors” might object to be modeled jointly with the “suckers”, shrinking their means towards the grand mean.
Further, we have to acknowledge that this is a problem of hypothesis testing. Does theurapeutic touching exist? Clearly, by positing three different latent groups and estimating the touchers posterior probability of belonging to them, we are not answering this question. A skeptic wouldn’t be convinced – we have presupposed the existance of therapeutic touching! A skeptic might use an uninformative prior for the alternative hypothesis, and then compute Bayes factor:
\[ \begin{align*} p(y|H_1) &= \int {N \choose k} \theta^k (1 - \theta)^{n-k} \cdot \frac{1}{Beta(1, 1)} \theta^{1 - 1} (1 - \theta)^{1 - 1} d\mathrm{\theta} \\[1.5ex] &= {N \choose k} \cdot \frac{Beta(k + 1, N - k + 1)}{Beta(1, 1)} \end{align*} \]
binom_BF <- function(k, N) {
H0 <- dbinom(k, N, .5) # likelihood of null hypothesis
H1 <- choose(N, k) * beta(k + 1, N - k + 1)
H1 / H0
}
TTdata = TTdata %>% mutate(BF10 = binom_BF(k, N))
ggplot(TTdata, aes(x = k/N, y = BF10)) + geom_histogram(stat = 'identity')
We see that even the “strongest sensor” has only about a Bayes factor of 2.5 in his favour. Interestingly, the “weakest sucker” has a higher Bayes factor! This is because we did an undirected test. If we were to test the directed hypothesis that therapeutic touchers operate above chance, we need to test exactly that! Without going to far (see e.g. here), the Bayes factor in favour of the directed hypothesis is just the proportion of posterior samples greater than \(\theta = .5\).
Noting that the posterior for our weakest sucker is
plot(function(x) dbeta(x, 2, 10), col = 'blue', lwd = 2, main = 'posterior weakest sucker')
We calculate the Bayes factor in favour of the hypothesis \(H_1: \theta > .5\)
BF10 = integrate(function(x) dbeta(x, 2, 10), .5, Inf)
1 / BF10$value # in favour of the null
## [1] 170.6667
Also note that the Bayes factor is just the evidence that is in the data. To make a conclusion, I would need to specify my prior belief about therapeutic touching; how certain am I that it exists? Concretely,
\[ \frac{p(H_1|D)}{p(H_0|D)} = \frac{P(D|H_1)}{P(D|H_0)} \frac{P(H_1)}{P(H_0)} \]
which for me personally yields \(\frac{1}{1700000} = \frac{1}{170} \frac{1}{10000}\) odds in favour of the alternative hypothesis, i.e. that therapeutic touching exists (is not merely placebo etc.).
Wrapping up, I very much like the flexibility of JAGS, allowing us to easily estimate these kinds of latent classes models. However, in this specific case, I think they answer the wrong question, and Bayes factors would be more appropriate. The question also is, of course, what it even means to perform worse than chance.