Probability

The mathematics of uncertainty—probability helps us reason about chance, risk, and randomness in a structured way. From rolling dice to predicting outcomes, it’s all about making sense of the unknown! (Bertsekas and Tsitsiklis 2008) 🎲

Learning Objectives

Learning objectives of the Probability section.

Summary Table

Summary of the Probability section.

Axiomatic Probability

Concept	Description	Example
Sample Space (S)	Set of all possible outcomes of an experiment	S = \{Heads, Tails\}
Event (A)	A subset of the sample space	A = \{1, 3, 5\} when rolling a die
Outcome	A single result from the sample space	Heads \in \{Heads, Tails\}
Axiom 1: Nonnegativity	Probability of any event is ≥ 0	P(A) \geq 0
Axiom 2: Normalization	Probability of the sample space is 1	P(S) = 1
Axiom 3: Additivity	For disjoint events, the probability of their union is the sum of parts	P(A \cup B) = P(A) + P(B) if A \cap B = \emptyset
Conditional Probability	Probability of A given B has occurred	P(A\|B) = \frac{P(A \cap B)}{P(B)}
Product Rule	Probability of intersection using conditional probability	P(A \cap B) = P(A\|B)P(B)
Total Probability Theorem	Compute probability over partitions of the sample space	P(B) = \sum_i P(B\|A_i)P(A_i)
Bayes’ Theorem	Reverse conditional probability using prior and likelihood	P(S\|W) = \frac{P(W\|S)P(S)}{P(W\|S)P(S) + P(W\|NS)P(NS)}
Independent Events	Events that do not affect each other’s probabilities	P(A \cap B) = P(A)P(B) if A \perp B
Conditioning and Independence	Independence may break down when conditioning on a third event	A \perp B might not imply A \perp B \| C

Random Variables

Concept	Discrete Random Variables	Continuous Random Variables
Sample Space	Countable outcomes	Uncountable outcomes
Domain of Variable	x \in \mathbb{Z}	x \in \mathbb{R}
Mapping	X: \text{Outcome} \rightarrow x \in \mathbb{Z}	X: \text{Event} \rightarrow x \in \mathbb{R}
Probability Function	Probability Mass Function (PMF):p_X(x) = P(X = x)	Probability Density Function (PDF):f_X(x) \geq 0
Total Probability	\sum_x p_X(x) = 1	\int_{-\infty}^{\infty} f_X(x) \, dx = 1
CDF (Cumulative Distribution)	F_X(x) = \sum_{k \leq x} p_X(k)	F_X(x) = \int_{-\infty}^{x} f_X(u) \, du
Probability of Exact Value	P(X = x) = p_X(x) > 0possible	P(X = x) = 0 for any exact x
Expectation	\mathbb{E}[X] = \sum_x x \, p_X(x)	\mathbb{E}[X] = \int_{-\infty}^{\infty} x \, f_X(x) \, dx
Variance	\text{Var}[X] = \sum_x (x - \mathbb{E}[X])^2 \, p_X(x)	\text{Var}[X] = \int_{-\infty}^{\infty} (x - \mathbb{E}[X])^2 \, f_X(x) \, dx
Joint Distribution	Joint PMF:p_{X,Y}(x, y) = P(X = x, Y = y)	Joint PDF:f_{X,Y}(x, y)
Marginal Distribution	p_X(x) = \sum_y p_{X,Y}(x, y)	f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \, dy
Conditional Probability	p_{X\|Y}(x\|y) = \frac{p_{X,Y}(x,y)}{p_Y(y)}	f_{X\|Y}(x\|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}
Conditional Expectation	\mathbb{E}[X\|Y=y] = \sum_x x \, p_{X\|Y}(x\|y)	\mathbb{E}[X\|Y=y] = \int x \, f_{X\|Y}(x\|y) \, dx
Independence	p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y)	f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)

Distributions

Distribution	Type	Support	Parameters	PDF / PMF	Common Use Case
Bernoulli	Discrete	x \in \{0, 1\}	p (success probability)	p_X(x) = \begin{cases} p & \text{if } x = 1, \\ 1 - p & \text{if } x = 0 \end{cases}	Binary outcomes (e.g., success/failure, yes/no)
Gaussian (Normal)	Continuous	x \in (-\infty, \infty)	\mu (mean), \sigma^2 (variance)	f_X(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}	Modeling natural phenomena, basis of CLT
Beta	Continuous	x \in (0, 1)	\alpha, \beta	f_X(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha - 1}(1 - x)^{\beta - 1}	Bayesian priors for probabilities, modeling proportions