Causal Inference is concerned with the interpretation of statistical results in various contexts. The key word here is interpretation - the interpretation you use when you’re walking around an art gallery. Causal Inference is a lot like walking around an art gallery, but where the various portraits and paintings correspond to statistical results.

I like this metaphor (simile?) for a number of reasons. It makes clear that causal inference is not concerned with trying to convince someone of something. The “best” you can do is to explain to your audience how you interpret something and provide the necessary background details so that the audience can form their own interpretation. Maybe they see it like you. Maybe they don’t. Again, it’s a lot like interpreting a piece of art.
Why are there multiple interpretations? There is a general trade off between questions that are important and those that can be credibly answered. We’ll try to expose you to a number of papers this semester so that you can develop your own empirical understanding of why this is. As a starting point, though, we’ll illustrate through a worked example that a randomized control trial — which is the most credible method — is typically limited in what it can answer.

As our worked example, let’s consider Raj Chetty’s paper on Moving to Opportunity. The paper is typically understood of as using a randomized control trial to show that children benefit more from moving to a low poverty neighborhood at a younger age. As the authors state in the abstract of the paper:
<aside>
“The decline in the gains from moving with the age when children move suggests that the duration of exposure to better environments during childhood is an important determinant of children’s long-term outcomes.”
</aside>
Okay maybe not the cleanest sentence. The point that the authors are trying to make is that the benefits from moving to a low-poverty neighborhood is decreasing with age. Moving to a low-poverty area at the age of $4$ is on average better for the kid than moving at the age of $14$.

Maybe you are excited by this result. Maybe not. Either way, that’s not actually our focus at the moment. What I’d like to highlight instead is that this conclusion doesn’t rest only on the variation introduced by the randomized control trial (talk about a tough sentence!). There’s an implicit assumption that the authors are making to get to this conclusion. We’ll examine this paper in greater detail later in the course when we cover mechanisms, but I think the analysis can be understood as is by breaking it into a couple of parts.

In the paper, families randomly receive an offer for a housing voucher to move to a low-poverty neighborhood. Because the offer is randomly assigned, the authors can estimate the impact of the offer on downstream outcomes like income, education, and marriage. This is fine. This is good. For the moment, don’t worry about the difference between an offer and actually moving.
The authors then compare the treatment effects between two age cohorts — children less than 13 and children between 13 and 18 — and conclude that children would benefit more by moving at a younger age.

Estimated Treatment Effects
But is that necessarily true? To understand whether the younger cohort benefits from a housing voucher at a younger age more than they would at an older age, we need to know how the younger group would respond to the offer if they were still in public housing at an older age and received the housing voucher offer at this older age. Re-reading that sentence again it’s clear that this latter treatment effect isn’t observed in the data.
To “get around” this missing data problem, the authors assume that the treatment effects observed for the older cohort is the same as the treatment effects of the younger cohort if they were to remain in public housing and then receiving the offer when they were older. Note, when we introduce mathematical notation later in the semester, this will be much easier to express. Bear with me! The question is — is this a good approximation?