- slides are not self-explanatory
- you need to come to class:
- read ahead
- think along
- take notes
- ask questions
- recap lesson later the same day:
- rethink
- reread
- take notes
- prepare questions
definition of conditional probability:
\[P(X \, | \, Y) = \frac{P(X \cap Y)}{P(Y)}\]
definition of Bayes rule:
\[P(X \, | \, Y) = \frac{P(Y \, | \, X) \times P(X)}{P(Y)}\]
version for data analysis:
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \times \underbrace{P(D \, | \, \theta)}_{likelihood}\]
Joint probability distribution as a two-dimensional matrix:
## blond brown red black ## blue 0.03 0.04 0.00 0.41 ## green 0.09 0.09 0.05 0.01 ## brown 0.04 0.02 0.09 0.13
Marginal distribution over eye color:
## blue green brown ## 0.48 0.24 0.28
Conditional probability given black hair:
## blue green brown ## 0.75 0.02 0.24
model of a coin flip:
model likelihood \(P(D \, | \, \theta)\):
## t=0 t=1/3 t=1/2 t=2/3 t=1 ## heads 0 0.33 0.5 0.67 1 ## tails 1 0.67 0.5 0.33 0
weighing in \(P(\theta)\):
## t=0 t=1/3 t=1/2 t=2/3 t=1 ## heads 0.0 0.07 0.1 0.13 0.2 ## tails 0.2 0.13 0.1 0.07 0.0
back to start: joint-probability distribution as 2d matrix again
bayes rule: \(P(\theta \, | \, D) \propto P(\theta) \times P(D \, | \, \theta)\)
## t=0 t=1/3 t=1/2 t=2/3 t=1 ## heads 0.0 0.07 0.1 0.13 0.2 ## tails 0.2 0.13 0.1 0.07 0.0
posterior probability \(P(\theta \, | \, \text{heads})\) after a toss with heads:
## t=0 t=1/3 t=1/2 t=2/3 t=1 ## 0.00 0.13 0.20 0.27 0.40
conventions:
probability of outcome \(\langle n_h, n_t \rangle\) is given by the binomial distribution:
\[P(\langle n_h, n_t \rangle \, | \, \theta) = {{n}\choose{n_h}} \theta^{n_h} \, (1-\theta)^{n_t}\]
## t=0 t=1/3 t=1/2 t=2/3 t=1 ## (0,2) 1 0.44 0.25 0.11 0 ## (1,1) 0 0.44 0.50 0.44 0 ## (2,0) 0 0.11 0.25 0.44 1
conventions:
probability of flip sequence with \(n_h\) heads and \(n_t\) tails is given by the "Kruschke-Bernoulli distribution":
\[P(\langle n_h, n_t \rangle \, | \, \theta) = \theta^{n_h} \, (1-\theta)^{n_h}\]
e.g., likelihood function for all flip sequences of at most length 2:
## t=0 t=1/3 t=1/2 t=2/3 t=1 ## 0 1 0.67 0.50 0.33 0 ## 1 0 0.33 0.50 0.67 1 ## 00 1 0.44 0.25 0.11 0 ## 01 0 0.22 0.25 0.22 0 ## 10 0 0.22 0.25 0.22 0 ## 11 0 0.11 0.25 0.44 1
Given a distribution \(P(x)\) over \(X\), the 95% highest density interval is a subset \(Y \subseteq X\) such that:
dummy
Intuition: range of values we are justified to belief in (categorically).
what if \(\theta\) is allowed to have any value \(\theta \in [0;1]\)?
(at least) two problems:
dummy
one solution:
dummy
dummy
2 shape parameters \(a, b > 0\), defined over domain \([0;1]\)
\[\text{Beta}(x \, | \, a, b) \propto x^{a-1} \, (1-x)^{b-1}\]
If the prior \(P(\theta)\) and the posterior \(P(\theta \, | \, D)\) are probability distributions of the same family, they are called conjugate; in particular, the posterior is then the conjugate prior for the likelihood function \(P(D \, | \, \theta)\) from which the posterior \(P(\theta \, | \, D)\) is derived.
NTS: The beta distribution is the conjugate prior of Kruschke's likelihood function.
Unravel definitions & rewrite:
\[ \begin{align*} P(\theta \, | \, \langle n_h, n_t \rangle) & \propto P(\langle n_h, n_t \rangle \, | \, \theta) \, \text{Beta}(\theta \, | \, a, b) \\ & \propto \theta^{n_h} \, (1-\theta)^{n_t} \, \theta^{a-1} \, (1-\theta)^{b-1} \\ & = \theta^{n_h + a -1} \, (1-\theta)^{n_t +b -1} \end{align*} \]
Hence, by definition:
\[ P(\theta \, | \, \langle n_h, n_t \rangle) = \text{Beta}(\theta \, | \, n_h + a, n_t + b)\]
problems:
dummy
solution: