In This Note We’re Going to
At the individual level, we’ve shown that causal inference is fundamentally a missing data problem: a person either receives or does not receive treatment. We never observe both potential outcomes and therefore don’t know an individual’s treatment effect. This motivates us to work the datasets: observations of multiple people. And here we’re using people to stand in for any type of unit of observations — schools, firms, states, etc. As we’ll show in this note, working with datasets doesn’t overcome the missing data problem so much as it introduces a new problem: selection bias.
In the previous note, we talked about the notion of causality, gave a quick refresher on functions and then introduced the notion of the potential outcome function. This gave us a mathematically precise way to express the thing we’re interested in: the average treatment effect:
$$ \mathbb{E}[\tilde{Y}_i(1) - \tilde{Y}_i(0)] $$
Now, we want to turn our attention to how we might estimate or approximate this term from data. To begin, we’ll make one simplifying assumption — that we have access to the entire population of interest. For instance, continuing with our housing voucher example, we’ll assume for the moment that we have data on each person in public housing — their income, family size, voucher states, etc.
So the question is, if we were to observe the entire population of interest, how might we go about estimating the average treatment effect?
Probably the simplest and first idea that comes to mind is to compare the outcome between those who are treated and those who aren’t treated (i.e are in the control group). For example, we could compare the average income of those who are offered a voucher $(\mathbb{E}[Y_i \vert D_i=1])$ to the average income of those who don’t $(\mathbb{E}[Y_i \vert D_i=0])$.
Worked Example
If we were given the following dataset containing the population of interest, the average income in the treated group would be $\$50.3K$ and in the control group $\$38.3K$. The difference-in-means is therefore $\$12K$.

Population Data Set
The first question to ask is — is this a good idea? Does the difference in the average earnings between these groups well approximated the average treatment effect?