Probability/Conditional Probability

Motivation
In the previous chapter, we have actually only dealt with. To be more precise, in the problems encountered in the previous chapter, the sample space is defined initially, and all probabilities are assigned with respect to that initial sample space. However, in many situations, after defining the initial sample space for a random experiment, we may get some new information about the random experiment. Hence we may need to the sample space based on those information. The probability based on this updated sample space is known as a.

To illustrate how we get the new information and update the sample space correspondingly, consider the following example:

From the example above, we are able to calculate the (conditional) probability "reasonably" through some arguments (see (b)) when the sample points in the initial sample space are equally likely. Furthermore, we can notice that the condition should be an occurrence of an event, which involves the sample points in the sample space. When the condition does not involves the sample points at all, it is irrelevant to the random experiment. For example, if the condition is "the poker deck costs $10", then this is clearly not an event in the sample space and does not involve the sample points. Also, it is irrelevant from this experiment.

To motivate the definition of conditional probability, let us consider more precisely how we obtain the (conditional) probability in (b). In (b), we are given that an ace is drawn out from the poker deck beforehand. This means that ace can never be drawn in our draw. This corresponds to the occurrence of the event (with respect to original sample space) $$\{\text{not drawing that ace from the poker deck}\}$$ (denoted by $$B$$) which consists of 51 sample points, resulting from excluding that ace from the original 52 sample points. Thus, we can regard the condition as the occurrence of event $$B$$. Now, under this condition, the sample space is updated to be the set $$B$$, that is, only points in $$B$$ are regarded as legit sample points now.

Consider part (b) again. Let us denote by $$A$$ the event $$\{\text{an ace is drawn from the deck}\}$$ (with respect to the original sample space).

Now, only of the points (whose also lie in the set $$B$$) in the set $$A$$ are regarded as legit sample points. All other points in set $$A$$ are no longer legit sample points anymore under this condition. In other words, only the points in sets $$A$$ and $$B$$ (i.e., in set $$A\cap B$$) are legit sample points in event $$A$$ under this condition.

In the part (b) above, only the three aces remaining in the deck (in both sets $$A$$ and $$B$$, and hence in set $$A\cap B$$) are considered to be legit sample points. The other ace in set $$A$$ (the ace that is drawn out in the condition) is considered to be a legit sample point, since that ace is not in the deck at all!

To summarize, when we want to calculate the of event $$A$$, we do the following: In the above example, we encounter a special case where the sample points in the initial sample space (assumed to be finite) are equally likely (and hence the sample points in the updated sample space $$B$$ should also be equally likely). In this case, using the result about combinatorial probability (in previous chapter), the conditional probability, denoted by $$\mathbb P(A|B)$$, is given by $$\mathbb P(A|B)=\frac{|A\cap B|}{|B|}=\frac{\text{number of (valid) sample points in }A}{\text{number of sample points in updated sample space }B}.$$ (Notice that "$$\mathbb P(A|B)$$" is just a notation ($$\mathbb P(\cdot|B)$$ is a function, $$A$$ is the "input"). Merely "$$A|B$$" means nothing. Particularly, "$$A|B$$" is an event/set.)
 * 1) We update the sample space to the set $$B$$.
 * 2) We only regard the sample points in set $$A\cap B$$ to be the (valid) sample points of event $$A$$.

When the sample points are equally likely, we can apply a theorem in the previous chapter for constructing probability measure on the updated sample space $$B$$. (Here, we assume that $$B$$ is countable.) Particularly, since we are only regarding the sample points in set $$A\cap B$$ as the (valid) sample points of event $$A$$, it the (naive) "conditional probability" of $$A$$ given the occurrence of event $$B$$ should be given by $$\mathbb P(A\cap B)=\sum_{\omega\in{\color{darkgreen}A\cap B}}^{}\mathbb P(\{\omega\})$$ according to that theorem (where $$\mathbb P$$ is the probability measure in the probability space $$(\Omega,\mathcal F,\mathbb P)$$).

However, when we apply the original probability measure $$\mathbb P$$ (in the original probability space) to every singleton event in the new sample space $$B$$, we face an issue: the sum of those probabilities are just $$\sum_{\omega\in B}^{}\mathbb P(\{\omega\}) =\mathbb P(B), $$ which is not 1 in general! But that theorem requires this probability to be 1! A natural remedy to this problem is to define a new probability measure $$\mathbb P(\cdot|B):\mathcal F\to[0,1]$$, based on the original probability measure and the above (naive) "conditional probability" $$\mathbb P(A\cap B)$$, such that the sum must be 1. After noticing that $$1=\frac{\mathbb P(B)}{\mathbb P(B)}$$, a natural choice of such probability measure is given by $$\mathbb P(A|B)=\frac{\mathbb P(A\cap B)}{\mathbb P(B)},$$ for every $$A\in\mathcal F$$. The probability $$\mathbb P(B)$$ can be interpreted as the constant, and every (naive) "conditional probability" (as suggested previously) is scaled by a factor of $$\frac{1}{\mathbb P(B)}$$. (It turns out that the probability measure $$\mathbb P(\cdot|B):\mathcal F\to [0,1]$$ defined in this way also satisfies all the probability axioms. We will prove this later.)

When comparing this formula with the formula for the equally likely sample points case, the two formulas look quite similar actually. In fact, we can express the formula for the in the same form as this formula (since the equally likely sample points case is actually just a special case for the theorem we are considering): $$\frac{|A\cap B|}{|B|}=\frac{|A\cap B|/|\Omega|}{|B|/|\Omega|}=\frac{\mathbb P(A\cap B)}{\mathbb P(B)}.$$

So, now we have developed a reasonable and natural formula to calculate the conditional probability $$\mathbb P(A|B)$$ when the outcomes are equally likely (applicable to finite sample space), and the outcomes are not equally likely (only for countable sample space). It is thus natural to also use the same formula when the sample space is uncountable. This motivates the following definition of conditional probability:

Definition
The following is a generalization to the formula $$\mathbb P(A\cap B)=\mathbb P(A|B)\mathbb P(B)$$. It is useful when we calculate the probability of the occurrence of multiple events together, by considering the events "one by one".

Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following section.

Law of total probability and Bayes' theorem
Sometimes, it is not an easy task to assign a suitable unconditional probability to an event. For instance, suppose Amy will perform a COVID-19 test, and the result is either positive or negative (impossible to have invalid results). Let $$P=\{\text{Amy tests positive at the COVID-19 test}\}$$. What should be $$\mathbb P(P)$$? It is actually quite difficult to answer directly, since this probability is without any condition. In particular, it is unknown that whether Amy gets infected by COVID-19 or not, and clearly the infection will affect the probability assignment quite significantly.

On the other hand, it may be easier to assign/calculate related conditional/unconditional probabilities. Now, let $$I=\{\text{Amy gets infected by COVID-19}\}$$. The probability $$\mathbb P(P|I)$$, called, may be known based on the research on the COVID-19 test. Also, the probability $$\mathbb P(P^c|I^c)$$, called, may also be known based on the research. Besides, the probability $$\mathbb P(I)$$ may be obtained according to studies on COVID-19 infection for Amy's place of living. Since $$\mathbb P(P)=\mathbb P(P\cap I)+\mathbb P(P\cap I^c),$$ by the definition of conditional probability, we have $$\mathbb P(P)=\mathbb P(P|I)\mathbb P(I)+\mathbb P(P|I^c)\mathbb P(I^c).$$ Since the conditional probability satisfies the probability axioms, we have the relation $$\mathbb P(P|I^c)=1-\mathbb P(P^c|I^c)$$, and thus the value of $$\mathbb P(P|I^c)$$ can be obtained. The remaining terms in the expression can also be obtained, as suggested by above. Thus, we can finally obtain the value of $$\mathbb P(P)$$.

This shows that conditional probabilities can be quite helpful for calculating unconditional probabilities, especially when we condition appropriately so that the conditional probabilities, and the probability of the condition are known in some ways.

The following theorem is an important theorem that relates unconditional probabilities and conditional probabilities, as in above discussion.

Now, suppose Amy has performed a COVID-19 test, and the result is positive! So now Amy is worrying about whether she really gets infected by COVID-19, or it is just a. Therefore, she would like to know the conditional probability $$\mathbb P(I|P)$$ (the conditional probability for getting infected given testing positive). Notice that the conditional probability $$\mathbb P(P|I)$$ may be known (based on some research). However, it does not equal the conditional probability $$\mathbb P(I|P)$$ generally. (These two probabilities are referring to two different things.) So, now we are interested in knowing that whether there is formula that relates these two probabilities, which have somewhat "similar" expressions. See the following exercise for deriving the relationship between $$\mathbb P(A|B)$$ and $$\mathbb P(B|A)$$:

The following theorem is the generalization of the above result.

The following is a famous problem.

Independence
From the previous discussion, we know that the conditional probability of event $$A$$ given the occurrence of event $$B$$ can be interpreted as the probability of $$A$$ where the sample space is updated to event $$B$$. In general, through this update, the probability of $$A$$ should be affected. But what if the probability is somehow the same as the one before the update?

If this is the case, then the occurrence of a particular event $$B$$ does not affect the probability of event $$A$$ actually. Symbolically, it means $$\mathbb P(A|B)=\mathbb P(A)$$. If this holds, then we have $$\mathbb P(B|A)=\frac{\mathbb P(A|B)\mathbb P(B)}{\mathbb P(A)}=\frac{\mathbb P(A)\mathbb P(B)}{\mathbb P(A)}=\mathbb P(B)$$. This means the occurrence of event $$A$$ also does not affect the probability of event $$B$$. This result matches with our intuitive interpretation of the independence of two events, so it seems that it is quite reasonable to define the independence of events $$A$$ and $$B$$ as follows: "Two events $A$ and $B$ are independent if $\mathbb P(A"

However, this definition has some slight issues. If $$\mathbb P(A)=0\text{ or }\mathbb P(B)=0$$, then some of the conditional probabilities involved may be. So, for some events, we may not be able to tell whether they are independent or not using this "definition". To deal with this, we consider an alternative definition, that is equivalent to the above when $$\mathbb P(A)\ne 0$$ and $$\mathbb P(B)\ne 0$$. To motivate that definition, we can see that $$\mathbb P(A|B)=\mathbb P(A)\text{ or }\mathbb P(B|A)=\mathbb P(B)\iff \mathbb P(A\cap B)=\mathbb P(A)\mathbb P(B)$$, when both $$\mathbb P(A)$$ and $$\mathbb P(B)$$ are nonzero. This results in the following definition:

But what if there are more than two events involved? Intuitively, we may consider the following as the general "definition" of independence: "The events $E_1,E_2,\dotsc,E_n$ are independent if $\mathbb P(E_1\cap E_2\cap\dotsb\cap E_n)=\mathbb P(E_1)\mathbb P(E_2)\dotsb\mathbb P(E_n)$. ($n$ is an integer that is at least 2.)"

But we will get some strange results by using this as the "definition":

From this example, merely requiring $$\mathbb P(E_1\cap E_2\cap\dotsb\cap E_n)=\mathbb P(E_1)\mathbb P(E_2)\dotsb\mathbb P(E_n)$$ cannot ensure all of events involved are independent. So, this suggests us another definition: "The events $E_1,E_2,\dotsc,E_n$ are independent if all the pairs of events involved are independent."

However, we will again get some strange results by using this as the "definition":

The above examples suggest us that we actually need requirements in the definition of independence of events $$A,B,C$$ for the definition to "make sense".
 * 1) $$\mathbb P(A\cap B\cap C)=\mathbb P(A)\mathbb P(B)\mathbb P(C)$$. (This ensures $$\mathbb P(C|A\cap B)=\mathbb P(C),\mathbb P(A|B\cap C)=\mathbb P(A)\text{ and }\mathbb P(B|A\cap C)=\mathbb P(B)$$. That is, if we condition on the intersection of two events, the probability of another one is not affected.)
 * 2) All the pairs of events $$A,B,C$$ are independent. (That is, $$\mathbb P(A\cap B)=\mathbb P(A)\mathbb P(B),\mathbb P(B\cap C)\mathbb P(B)\mathbb P(C)\text{ and }\mathbb P(A\cap C)=\mathbb P(A)\mathbb P(C)$$.

Similarly, the independence of four events $$A,B,C,D$$ should require
 * 1) $$\mathbb P(A\cap B\cap C\cap D)=\mathbb P(A)\mathbb P(B)\mathbb P(C)\mathbb P(D)$$.
 * 2) All the triples of events are independent.
 * In other words, we need the probability of intersection of any three and any two events to be able to "split as product of probabilities of single events" as in above.

This leads us to the following general definition:

In general, we have the following result.

Conditional independence
independence is a conditional version of independence, and has the following definition which is similar to that of independence.