Probability Theory/Probability spaces

Anyone trying to reason with incomplete information must deal with the following fact: There are many possible ways the world could be, and we don't know exactly which one of these ways is true. Imagine Alice's friend Bob flips a coin. He catches it in his right hand and slaps it down onto a table. He does not lift his hand, so Alice cannot see how the coin landed. Then from Alice's perspective there are two ways the world could be: Either the coin is sitting on the table with its heads side up, or it's sitting with its tails side up. This set of possibilities is denoted $$\Omega=\{h, t\}$$, where $$h$$ denotes a heads outcome, and $$t$$ denotes a tails outcome. (In reality, more cases are possible. For example, through sleight of hand, Bob might have slipped the coin into his pocket so that there is no coin sitting on the table at all. For most examples in this book, we will ignore such edge cases, but it's important to remember that they are possible.)

Depending on how finely one examines the world, there may be many possible configurations of the world that correspond to heads. So $$h_1, h_2, h_3, \dots$$ might all correspond to the coin landing heads, but for each one of them, the air molecules in the room will be moving around in a slightly different pattern. In that case, we would have $$\Omega=\{h_1, h_2, h_3, \dots t_1, t_2, t_3, \dots\}$$. We shall have to construct our theory so that someone who cares about the coin but doesn't care about the air molecules will get the same answer either way.

Let's suppose the the coin and the flip were both fair. Then since there is no obvious difference between heads and tails, they are intuitively both equally likely outcomes. In probability theory, we represent how likely something is by a single real number between 0 and 1, called the probability. A higher number means that something is more likely, a lower number means less likely. Something that's certainly true has probability 1, the largest possible probability, while something that's certainly false has probability 0, the smallest possible probability. We can imagine a function $$p$$ that gives us the probability of an statement. A statement is like a yes/no question, or a declarative sentence that can be either true or false. "The coin came up tails" would be an example of an statement. Here's the formal definition:

An statement $$S$$ is like a sentence, since $$S\subset\Omega$$ is the subset of all configurations of the world where that sentence is true. Likewise, a statement $$S$$ is like a yes/no question since it is the subset of all configurations of the world where the answer to that question is yes. "The coin came up tails" would thus correspond to the set $$\{t\}$$ (or $$\{t_1, t_2, t_3, \dots\}$$ if we're letting air molecules distinguish our configurations).

The probability measure is a function $$p$$ that takes a statement, which is a subset of $$\Omega$$, and returns a real number between 0 and 1. So its type signature is $$p:\mathcal P (\Omega) \nrightarrow [0, 1]$$. Note that we used the symbol $$\nrightarrow$$ instead of a normal arrow. This is because $$p$$ is a partial function, which means that even if we give it some subset of $$\Omega$$ as input, it's allowed to not give us an answer. This part of the definition becomes necessary when dealing with some infinite configuration spaces, where certain statements end up having undefined probabilities.

Because it's sometimes nice to have $$p$$ be a total function instead of a partial function, we can let $$\mathcal F \subset \mathcal P(\Omega)$$ be the set of subsets of $$\Omega$$ that have defined probabilities. So we can also say that $$p$$ has the type signature $$p:\mathcal F \to [0, 1]$$.

Kolmogorov came up with a set of axioms that define the rules of probability theory:


 * 1) Non-negativity: $$\forall X\in\mathcal F; p(X) \geq 0$$
 * 2) Normalization: $$p(\Omega) = 1$$
 * 3) Closure under complement: $$\forall X \in \mathcal F; (\Omega - X) \in \mathcal F$$
 * 4) σ-additivity: For any countable sequence of disjoint events $$[X_1, X_2, X_3, \dots] \in \mathcal F^{\mathbb N}$$, we have $$p\left(\bigcup_{n=1}^\infty\right) X_n = \sum_{n=1}^\infty p(X_n)$$

Note that axiom 4 imposes a constraint on $$\mathcal F$$: If the statements $$X_1, X_2, X_3, \dots$$ are disjoint, and if all of them are in $$\mathcal F$$, then their union must also be in $$\mathcal F$$.