Please read this: Name your file homework5_lastname_firstname. This set is due 2/17. The submission should be in Rmarkdown, and should go to Fabian with subject BDA: Homework 5 (submit both .Rmd and .pdf). We encourage discussing exercises with fellow students, but only individual solutions will be accepted.

1. Conjugate priors (4)

  1. Read up on the Poisson distribution. You are given the Poisson likelihood

\[ p(x|\lambda) = \frac{1}{x!} \lambda^{x} e^{-\lambda} \]

assuming independent and identically distributed data points, show that the gamma prior is conjugate

\[ p(\lambda|\alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} \lambda^{\alpha - 1} e^{-\beta \lambda} \]

  1. How can you interpret the parameters of the prior distribution?

2. Logistic regression (10)

  1. Implement a logistic regression, predicting admission as a function of all other variables. Justify your prior specification, and interpret your results.
library('rjags')
dat = read.csv('http://www.ats.ucla.edu/stat/data/binary.csv')
head(dat)
##   admit gre  gpa rank
## 1     0 380 3.61    3
## 2     1 660 3.67    3
## 3     1 800 4.00    1
## 4     1 640 3.19    4
## 5     0 520 2.93    4
## 6     1 760 3.00    2
  1. Sanity-check your result using classical estimation (R function glm).
freq = glm(...)
  1. Fit your model on the first 350 observations only (the training set). Use JAGS to predict the other 50 (the test set). You can do this by writing
dat[351:400, 1] <- NA

and then just estimate as usual (with ‘y’ as a parameter you additionally estimate). You will get \(x\) predictions for each value, where \(x\) is the total number of iterations of your chains (i.e., samples from the posterior predictive distribution). Average them, predict \(y = 1\) if \(\hat y > .5\), and compute the mean prediction error (i.e., check if your predicted values equal the observations in your test set).

  1. Use them same procedure as above, but remove the GRE score as a predictor. How does your prediction accuracy change? What do you conclude? Is it in stark contrast to your posterior inference about the GRE predictor?

3. Multilevel modeling (10)

  1. Read the (very short) paper by Gelman (2006). What is the difference between no pooling, complete pooling, and partial pooling? Why is the latter preferred (see Figure 1)?

  2. We will use a package that has just been released by the Stan team. It is a-w-e-s-o-m-e. Before you get started fitting your models, take a look at the documentation on “How to Use rstanarm”.

library('rstanarm')

# browseVignettes('rstanarm') # opens documentation

radon = read.csv('http://www.math.chalmers.se/Stat/Grundutb/CTH/mve186/1415/radon.csv')
radon = radon[, c('log_radon', 'basement', 'county_code', 'Uppm')]
radon$basement = as.numeric(radon$basement) - 2 # NO is 0
radon$Uppm = log(radon$Uppm) # log transform uranium (otherwise convergence issues)

head(radon)
##    log_radon basement county_code       Uppm
## 1 0.83290912        0           0 -0.6890476
## 2 0.83290912        1           0 -0.6890476
## 3 1.09861229        1           0 -0.6890476
## 4 0.09531018        1           0 -0.6890476
## 5 1.16315081        1           1 -0.8473129
## 6 0.95551145        1           1 -0.8473129

Below I have fitted the multilevel model as described in Gelman (2006) to the radon data.

CORES = 4 # adjust for your machine!
m = stan_lmer(log_radon ~ basement + (Uppm|county_code),
              data = radon, cores = CORES,
              prior = student_t(df = 5, location = 0),
              prior_intercept = student_t(df = 5, location = 0))

Make sure you understand what is going on. I use basement as a house-level predictor, while Uppm enters as a county-level predictor. Here are a few questions (tip: use ‘print(m)’ and ‘coef(m)’):

  1. Think about the priors specified in the model above. Is anything problematic about them? Why, why not? How do they differ from the priors we employed in previous exercises?

  2. Plot the posterior predictive distribution (tip: there is a function for that in the rstanarm package). You might want to look at our \(9^{th}\) session from class (and this). What is the posterior predictive \(p\) value? Compare it to the Bayes factor; what are the differences?

4. Poisson regression (10)

  1. Implement a poisson regression in JAGS, predicting the number of published papers as a function of the other variables. What is the type of your predictors, respectively (categorical, ordered, continuous)?
library('pscl') # from the excellent Jackman (2009)

head(bioChemists) # type ?bioChemists for info about the dataset
##   art   fem     mar kid5  phd ment
## 1   0   Men Married    0 2.52    7
## 2   0 Women  Single    0 2.05    6
## 3   0 Women  Single    0 3.75    6
## 4   0   Men Married    1 1.18    3
## 5   0 Women  Single    0 3.75   26
## 6   0 Women Married    2 3.59    2
  1. Interpret the coefficients. What are the 95% HDIs? How do the predictors relate to the number of papers published?

  2. Why did we fit a poisson model here? Why not a normal linear regression? Was it a good choice? Are there any issues with this model (tip: overdispersion)?

mpois = stan_glm(art ~ fem + mar + kid5 + phd + ment,
                 data = bioChemists,
                 prior = student_t(df = 7),
                 prior_intercept = student_t(df = 7),
                 family = poisson(link = 'log'), cores = CORES)

Additionally, checkout the shiny app that comes with rstanarm and that allows you to explore various aspects of your fitted model.

launch_shinystan(mpois) # this will blow your mind