Probability Theory/Conditional probability

Basics and multiplication formula
Using multiplicative notation, we could have written
 * $$P_A(B) := \frac{P(BA)}{P(A)}$$

instead.

This definition is intuitive, since the following lemmata are satisfied:

Lemma 3.2:


 * $$A \subseteq B \Rightarrow P_A(B) = 1$$

Lemma 3.3:


 * $$P_A(B + C) = P_A(B) + P_A(C)$$

Each lemma follows directly from the definition and the axioms holding for $$P$$ (definition 2.1).

From these lemmata, we obtain that for each $$A \in \mathcal F$$, $$(\Omega, \mathcal F, P_A)$$ satisfies the defining axioms of a probability space (definition 2.1).

With this definition, we have the following theorem:

Proof:

From the definition, we have
 * $$P_A(B) P(A) = P(AB)$$

for all $$A, B \in \mathcal F$$. Thus, as $$\mathcal F$$ is an algebra, we obtain by induction:
 * $$\begin{align}

P(A_1 A_2 \cdots A_n) & = P((A_1 A_2 \cdots A_{n-1}) A_n) \\ & = P_{A_1 \cdots A_{n-1}}(A_n) P(A_1 \cdots A_{n-1}) \\ & = P_{A_1 \cdots A_{n-1}}(A_n) P_{A_1 \cdots A_{n-2}}(A_{n-1}) \cdots P_{A_1} (A_2) P(A_1). \end{align}$$

Bayes' theorem
Proof:


 * $$\begin{align}

\sum_{j=1}^n P(A_j) P_{A_j}(B) & = \sum_{j=1}^n P(A_j) \frac{P(A_j \cap B)}{P(A_j)} \\ & = \sum_{j=1}^n P(A_j B) \\ & = P\left( \sum_{j=1}^n A_j B \right) \\ & = P\left( \left( \sum_{j=1}^n A_j \right) B \right) \\ & = P(\Omega B) \\ & = P(B), \end{align}$$ where we used that the sets $$A_1 B, \ldots, A_n B$$ are all disjoint, the distributive law of the algebra $$\mathcal F$$ and $$\Omega \cap B = B$$.

Proof:


 * $$\frac{P(A) P_A(B)}{P(B)} = \frac{P(A) \frac{P(A \cap B)}{P(A)}}{P(B)} = P_B(A)$$.

This formula may look somewhat abstract, but it actually has a nice geometrical meaning. Suppose we are given two sets $$A, B \in \mathcal F$$, already know $$P(A)$$, $$P(B)$$ and $$P_A(B)$$, and want to compute $$P_B(A)$$. The situation is depicted in the following picture:

We know the ratio of the size of $$A \cap B$$ to $$A$$, but what we actually want to know is how $$A \cap B$$ compares to $$B$$. Hence, we change the 'comparitant' by multiplying with $$P(A)$$, the old reference magnitude, and dividing by $$P(B)$$, the new reference magnitude.

Proof:

From the basic version of the theorem, we obtain
 * $$P_B(A_j) = \frac{P_{A_j}(B) P(A_j)}{P(B)}$$.

Using the formula of total probability, we obtain
 * $$P_B(A_j) = \frac{P_{A_j}(B) P(A_j)}{\sum_{k=1}^n P(A_k) P_{A_k}(B)}$$.