Difference-in-Means

The Population Probability Space

Screenshot 2024-11-24 at 1.11.40 PM.png

We begin as always by defining the population probability space.

$$ \big(\Omega, \mathcal{F}, \mathbb{P} \big) $$

On this space, we can define the potential outcome random variables

$$ \tilde{Y}: \{0,1\} \to \Omega \to \mathcal{R} $$

And our parameter of interest

$$ \theta = \int \tilde{Y}_1 - \tilde{Y}_0 d\mathbb{P} $$

Note, we do not need to (although we can) define the treatment variable on this probability space.

The Sample Probability Space

Screenshot 2024-11-24 at 1.30.10 PM.png

We can define the sample probability space as follows:

$$ \big(\mathbb{\Omega}_n, \mathcal{F}_n, \mathbb{P}_n\big) $$

On this space, we can define the potential outcome random variables (recall that they are not explicitly observed), treatment, outcome, and control variables. Note, we allow for an unrestricted control space.

$$ \begin{align*}\tilde{Y} &: N \to \{0,1\} \to \Omega_n \to \mathcal{R} \\ D &: N \to \Omega_n \to \{0, 1\} \\ Y &: N \to \Omega_n \to \mathcal{R} \\ X &: N \to \Omega_n \to \mathcal{X}\end{align*} $$

Difference-in-Means

$$ \mathbb{E}[Y \vert D=1] - \mathbb{E}[Y \vert D=0] $$

Randomized Control Trial

In an RCT, $\tilde{Y}_1$ is independent of $D$ by construction, and therefore, the following holds.