Please read this: Name your file homework5_lastname_firstname. This set is due 2/17. The submission should be in Rmarkdown, and should go to Fabian with subject BDA: Homework 5 (submit both .Rmd and .pdf). We encourage discussing exercises with fellow students, but only individual solutions will be accepted.
\[ p(x|\lambda) = \frac{1}{x!} \lambda^{x} e^{-\lambda} \]
assuming independent and identically distributed data points, show that the gamma prior is conjugate
\[ p(\lambda|\alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} \lambda^{\alpha - 1} e^{-\beta \lambda} \]
library('rjags')
dat = read.csv('http://www.ats.ucla.edu/stat/data/binary.csv')
head(dat)
## admit gre gpa rank
## 1 0 380 3.61 3
## 2 1 660 3.67 3
## 3 1 800 4.00 1
## 4 1 640 3.19 4
## 5 0 520 2.93 4
## 6 1 760 3.00 2
freq = glm(...)
dat[351:400, 1] <- NA
and then just estimate as usual (with ‘y’ as a parameter you additionally estimate). You will get \(x\) predictions for each value, where \(x\) is the total number of iterations of your chains (i.e., samples from the posterior predictive distribution). Average them, predict \(y = 1\) if \(\hat y > .5\), and compute the mean prediction error (i.e., check if your predicted values equal the observations in your test set).
Read the (very short) paper by Gelman (2006). What is the difference between no pooling, complete pooling, and partial pooling? Why is the latter preferred (see Figure 1)?
We will use a package that has just been released by the Stan team. It is a-w-e-s-o-m-e. Before you get started fitting your models, take a look at the documentation on “How to Use rstanarm”.
library('rstanarm')
# browseVignettes('rstanarm') # opens documentation
radon = read.csv('http://www.math.chalmers.se/Stat/Grundutb/CTH/mve186/1415/radon.csv')
radon = radon[, c('log_radon', 'basement', 'county_code', 'Uppm')]
radon$basement = as.numeric(radon$basement) - 2 # NO is 0
radon$Uppm = log(radon$Uppm) # log transform uranium (otherwise convergence issues)
head(radon)
## log_radon basement county_code Uppm
## 1 0.83290912 0 0 -0.6890476
## 2 0.83290912 1 0 -0.6890476
## 3 1.09861229 1 0 -0.6890476
## 4 0.09531018 1 0 -0.6890476
## 5 1.16315081 1 1 -0.8473129
## 6 0.95551145 1 1 -0.8473129
Below I have fitted the multilevel model as described in Gelman (2006) to the radon data.
CORES = 4 # adjust for your machine!
m = stan_lmer(log_radon ~ basement + (Uppm|county_code),
data = radon, cores = CORES,
prior = student_t(df = 5, location = 0),
prior_intercept = student_t(df = 5, location = 0))
Make sure you understand what is going on. I use basement as a house-level predictor, while Uppm enters as a county-level predictor. Here are a few questions (tip: use ‘print(m)’ and ‘coef(m)’):
Think about the priors specified in the model above. Is anything problematic about them? Why, why not? How do they differ from the priors we employed in previous exercises?
Plot the posterior predictive distribution (tip: there is a function for that in the rstanarm package). You might want to look at our \(9^{th}\) session from class (and this). What is the posterior predictive \(p\) value? Compare it to the Bayes factor; what are the differences?
library('pscl') # from the excellent Jackman (2009)
head(bioChemists) # type ?bioChemists for info about the dataset
## art fem mar kid5 phd ment
## 1 0 Men Married 0 2.52 7
## 2 0 Women Single 0 2.05 6
## 3 0 Women Single 0 3.75 6
## 4 0 Men Married 1 1.18 3
## 5 0 Women Single 0 3.75 26
## 6 0 Women Married 2 3.59 2
Interpret the coefficients. What are the 95% HDIs? How do the predictors relate to the number of papers published?
Why did we fit a poisson model here? Why not a normal linear regression? Was it a good choice? Are there any issues with this model (tip: overdispersion)?
mpois = stan_glm(art ~ fem + mar + kid5 + phd + ment,
data = bioChemists,
prior = student_t(df = 7),
prior_intercept = student_t(df = 7),
family = poisson(link = 'log'), cores = CORES)
Additionally, checkout the shiny app that comes with rstanarm and that allows you to explore various aspects of your fitted model.
launch_shinystan(mpois) # this will blow your mind