Bayesian data analysis: Homework set 2

Please read this: Name your file homework2_lastname_firstname. This set is due 11/18 before class. The submission should be in Rmarkdown, and should go to Fabian with subject BDA: Homework 2 (submit both .Rmd and .html / .pdf). We encourage discussing exercises with fellow students, but only individual solutions will be accepted.

1. Why most published research findings are false

Read the blog post on p-value logic.

Explain in your own words what “power” is in this context.
Write a function, ppv, that gives the positive predictive value, and a function, fdr, that gives the false discovery rate. Both functions take as arguments the prior probability of \(H_1\), the nominal \(\alpha\) rate, and the statistical power \(1 - \beta\).

ppv <- function(prior, alpha, power) {
    # TODO
}

fdr <- function(prior, alpha, power) {
    # TODO
}

Plot the positive predictive value for interestingly different values of \(P(H_1)\) and \(\beta\). What do you observe? How much do your results depend on \(P(H_1)\) and \(\beta\)? Why is most published research false? As hint, the positive predictive value and fdr are intimitately related (see blog post) and the former is given by Bayes’ rule:

\[ P(H_1|p < \alpha) = \frac{P(p < \alpha|H_1)P(H_1)}{P(p < \alpha)} \]

(optional) context reading about p-hacking etc:
- blog posts
  - Record of a Revolution
  - The Reformation
- papers

2. MLE vs. Bayes

Given three coin flips, all coming up heads, what is the probability that the next coin flip will show heads? Give the prediction via maximum likelihood and – assuming an uninformative prior – Bayesian inference (use the mean, median, and mode of the posterior). Discuss your findings.

Note: Make every step explicit. How do you derive the MLE? How do you derive the posterior? Etc.

3. General Linear Model

Let’s look at data for sale rates after spending various amounts on advertising on TV, radio or in newspapers. We are interested in the question which outlet for advertising “significantly” influences sale rates. Download the data from here, and import it like so:

dat <- read.csv('Advertising.csv', header = TRUE)[, -1]

Use the built-in function lm from R to run a linear model that predicts sales, our dependent variable, as a function of money spend advertising on TV, radio and in newspapers, our independent variables. Inspect the output using function summary and explain the most relevant aspects of the returned output table. What do you conclude from that about the effects of different outlets for advertising on sales figures?

classicalModel <- lm(Sales ~ TV + Radio + Newspaper, data = dat)
summary(classicalModel)

Use the BayesFactor package to run a Bayesian regression; predict Sales using TV. Print the model summary and interpret the resulting Bayes factor.

library('BayesFactor')

dat <- read.csv('Advertising.csv', header = TRUE)[, -1]
BayesModels <- regressionBF(Sales ~ TV + Radio + Newspaper, data = dat)
summary(BayesModels)

Explain what the main conceptual difference is between the classical and the Bayesian approach. Hint: one is about factors within a model, another about comparison between models.

4. MCMC

Look at this code whose MCMC sample chain output we inspected in class:

require('coda')

fakeData = rnorm(200, mean = 0, sd = 1)

f = function(mu, sigma){
  if (sigma <=0){
    return(0)
  }
  priorMu = dunif(mu, min = -4, max = 4)
  priorSigma = dunif(sigma, min = 0, max = 4)
  likelihood =  prod(dnorm(fakeData, mean = mu, sd = sigma))
  return(priorMu * priorSigma * likelihood)
}

MH = function(f, iterations = 50, chains = 2, burnIn = 0){
  out = array(0, dim = c(chains, iterations - burnIn, 2))
  dimnames(out) = list("chain" = 1:chains, 
                       "iteration" = 1:(iterations-burnIn), 
                       "variable" = c("mu", "sigma"))
  for (c in 1:chains){
    mu = runif(1, min = -4, max = 4)
    sigma = runif(1, min = 0, max = 4)
    for (i in 1:iterations){
      muNext = mu + runif(1, min = -1.25, max = 0.25)
      sigmaNext = sigma + runif(1, min = -0.25, max = 0.25)
      rndm = runif(1, 0, 1)
      if (f(mu, sigma) < f(muNext, sigmaNext) | f(muNext, sigmaNext) >= f(mu, sigma) * rndm) {
        mu = muNext
        sigma = sigmaNext
      }
      if (i >= burnIn){
        out[c,i-burnIn,1] = mu
        out[c,i-burnIn,2] = sigma
      }
    }
  }
  return(mcmc.list(mcmc(out[1,,]), mcmc(out[2,,])))
}

out = MH(f, 60000,2,10000)
# save(out, file = "out.Rdata")

Explain what this code is doing. What algorithm is implemented here? How is it implemented? What is approximated here?
Does this code differ substantially from the version of the algorithm that we discussed in class?
Mess with the proposal distribution in such a way that the algorith becomes less efficient. Use appropriate plots and metrics (trace plots, R-hat, effective sample size) to show that, all else equal, your messed up proposal distribution is worse than what’s in the code above.