disclaimer

  • slides are not self-explanatory
  • you need to come to class:
    • read ahead
    • think along
    • take notes
    • ask questions
    • recap lesson later the same day:
      • rethink
      • reread
      • take notes
      • prepare questions

key notions

  • joint distributions & marginalization
  • conditional probability & bayes rule
  • highest density interval
  • beta distribution
  • conjugate priors

recap

definition of conditional probability:

\[P(X \, | \, Y) = \frac{P(X \cap Y)}{P(Y)}\]

definition of Bayes rule:

\[P(X \, | \, Y) = \frac{P(Y \, | \, X) \times P(X)}{P(Y)}\]

version for data analysis:

\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]

bayes rule in multi-D

proportions of eye & hair color

Joint probability distribution as a two-dimensional matrix:

##       blond brown  red black
## blue   0.03  0.04 0.00  0.41
## green  0.09  0.09 0.05  0.01
## brown  0.04  0.02 0.09  0.13

Marginal distribution over eye color:

##  blue green brown 
##  0.48  0.24  0.28

Conditional probability given black hair:

##  blue green brown 
##  0.75  0.02  0.24

model & data

model of a coin flip:

  • bias \(\theta\) is the probability of heads on a single trial
  • we consider five possibilities \(\theta \in \{0, \frac{1}{3}, \frac{1}{2}, \frac{2}{3}, 1\}\)
  • flat prior beliefs: \(P(\theta) = .2\,, \forall \theta\)

model likelihood \(P(D \, | \, \theta)\):

##       t=0 t=1/3 t=1/2 t=2/3 t=1
## heads   0  0.33   0.5  0.67   1
## tails   1  0.67   0.5  0.33   0

weighing in \(P(\theta)\):

##       t=0 t=1/3 t=1/2 t=2/3 t=1
## heads 0.0  0.07   0.1  0.13 0.2
## tails 0.2  0.13   0.1  0.07 0.0

back to start: joint-probability distribution as 2d matrix again

model, data & bayesian inference

bayes rule: \(P(\theta \, | \, D) \propto P(\theta) \times P(D \, | \, \theta)\)

##       t=0 t=1/3 t=1/2 t=2/3 t=1
## heads 0.0  0.07   0.1  0.13 0.2
## tails 0.2  0.13   0.1  0.07 0.0

posterior probability \(P(\theta \, | \, \text{heads})\) after a toss with heads:

##   t=0 t=1/3 t=1/2 t=2/3   t=1 
##  0.00  0.13  0.20  0.27  0.40

fun with coins

likelihood function for several tosses

conventions:

  • heads is 1; tails is 0
  • pair \(\langle n_h, n_t \rangle\) is an outcome with \(n_h\) heads, \(n_t\) tails
  • \(n = n_h + n_t\) is the total number of flips

probability of outcome \(\langle n_h, n_t \rangle\) is given by the binomial distribution:

\[P(\langle n_h, n_t \rangle \, | \, \theta) = {{n}\choose{n_h}} \theta^{n_h} \, (1-\theta)^{n_t}\]

##       t=0 t=1/3 t=1/2 t=2/3 t=1
## (0,2)   1  0.44  0.25  0.11   0
## (1,1)   0  0.44  0.50  0.44   0
## (2,0)   0  0.11  0.25  0.44   1

excursion: Kruschke's approach

conventions:

  • heads is 1; tails is 0
  • outcomes are arbitrary sequences of flip results, e.g.: \(1001\)
  • each flip result is assumed to be independent

probability of flip sequence with \(n_h\) heads and \(n_t\) tails is given by the "Kruschke-Bernoulli distribution":

\[P(\langle n_h, n_t \rangle \, | \, \theta) = \theta^{n_h} \, (1-\theta)^{n_h}\]

e.g., likelihood function for all flip sequences of at most length 2:

##    t=0 t=1/3 t=1/2 t=2/3 t=1
## 0    1  0.67  0.50  0.33   0
## 1    0  0.33  0.50  0.67   1
## 00   1  0.44  0.25  0.11   0
## 01   0  0.22  0.25  0.22   0
## 10   0  0.22  0.25  0.22   0
## 11   0  0.11  0.25  0.44   1

influence of sample size on posterior

KruschkeFig5.2

influence of sample size on posterior

KruschkeFig5.3

highest density intervals

highest density interval

Given a distribution \(P(x)\) over \(X\), the 95% highest density interval is a subset \(Y \subseteq X\) such that:

  1. \(P(Y) = .95\), and
  2. no point outside of \(Y\) is more likely than any point within.

dummy

Intuition: range of values we are justified to belief in (categorically).

examples

KruschkeFig5.3

welcome infinity

continuous biases

what if \(\theta\) is allowed to have any value \(\theta \in [0;1]\)?

(at least) two problems:

  1. how to specify \(P(\theta)\) in a concise way?
  2. how to compute normalizing constant \(\int_0^1 P(\theta \, | \, D) \times P(\theta) \, \text{d}\theta\) in Bayes rule?

dummy

one solution:

  • use beta distribution to specify prior \(P(\theta)\) with some handy parameters
  • since this is the conjugate prior to our likelihood function, computing posteriors is as easy as sleep

dummy

dummy

KruschkeFig5.3

beta distribution

2 shape parameters \(a, b > 0\), defined over domain \([0;1]\)

\[\text{Beta}(x \, | \, a, b) \propto x^{a-1} \, (1-x)^{b-1}\]

KruschkeFig6.1

conjugate distributions

If the prior \(P(\theta)\) and the posterior \(P(\theta \, | \, D)\) are probability distributions of the same family, they are called conjugate; in particular, the posterior is then the conjugate prior for the likelihood function \(P(D \, | \, \theta)\) from which the posterior \(P(\theta \, | \, D)\) is derived.

NTS: The beta distribution is the conjugate prior of Kruschke's likelihood function.

Unravel definitions & rewrite:

\[ \begin{align*} P(\theta \, | \, \langle n_h, n_t \rangle) & \propto P(\langle n_h, n_t \rangle \, | \, \theta) \, \text{Beta}(\theta \, | \, a, b) \\ & \propto \theta^{n_h} \, (1-\theta)^{n_t} \, \theta^{a-1} \, (1-\theta)^{b-1} \\ & = \theta^{n_h + a -1} \, (1-\theta)^{n_t +b -1} \end{align*} \]

Hence, by definition:

\[ P(\theta \, | \, \langle n_h, n_t \rangle) = \text{Beta}(\theta \, | \, n_h + a, n_t + b)\]

example applications

KruschkeFig6.4

the road ahead

bda more generally

problems:

  • conjugate priors are not always available:
    • likelihood functions can come from unbending beasts:
      • complex hierarchical models (e.g., regression)
      • custom-made stuff (e.g., probabilistic grammars)
  • even when available, they may not be what we want:
    • prior beliefs could be different from what a conjugate prior can capture

dummy

solution:

  • approximate posterior distribution by smart numerical simulations

Next sessions

  • compare bda with NHST (Fabian)
  • introduce R (Fabian)
  • MCMC methods (Michael)
  • even more fun with coins (Michael)

Homework

  • read Kruschke (2015) chapters 5 and 6 to recap this class
  • read Wagenmakers (2007) in preparation for next class
  • start on first homework set, due November 4th
    • ask questions asap if anything is unclear