In This Set of Notes We’re Going to
As we discussed in a previous note, identification strategies differ in how they account for selection bias. The central idea behind the difference-in-difference approach is to use pre-treatment data to approximate the selection bias.
Consider the example of estimating the impact of legal assistance in an eviction case on housing stability. Let’s say that a state has decided to provide legal assistance in the counties which experience the highest rates of eviction. Lawyers are costly, so the state decides to target tenants who are thought to most be in need of assistance.
The difference-in-means approach would compare eviction rates between those in counties where legal aid is available ($D_i=1$) to those in counties where it’s not ($D_i=0)$. The concern, is that tenants (and perhaps landlords?!) in these two groups of counties differ in ways that are related to housing instability. More specifically, we might imaging that in the absence of legal aid, the eviction rate is higher in the cities whom the state decides to target for assistance.
$$ \mathbb{E}[\tilde{Y}_i(0) \vert D_i =1] - \mathbb{E}[\tilde{Y}_i(0) \vert D_i =0] > 0 $$
To make things concrete, let’s say that the observed difference in means is $5p.p$. That is tenants in counties with legal aid lawys have a probability of getting evicted that is 5 percentage points greater than those in a county without legal assistance. This difference captures both the average treatment on the treated (which we likely assume is negative, right? Lawyers should presumably decrease the likelihood of an eviction) and selection bias (which we think is positive).
One idea is to approximate the selection bias by making use of pre-treatment data. That is, prior to the rollout of the legal assistance, we observe the eviction rate for individuals in the counties who will receive legal assistance ($D_i=1$) and those who will not ($D_i=0$). We can use this difference as an estimate for the selection bias.
$$ \mathbb{E}[Y_{i,t-1} \vert D_i=1] - \mathbb{E}[Y_{i,t-1} \vert D_i=0] \approx \mathbb{E}[\tilde{Y}{i,t}(0) \vert D_i =1] - \mathbb{E}[\tilde{Y}{i,t}(0) \vert D_i =0] $$
If we take the difference in means, and subtract the pre-treatment difference, this difference-in-difference estimator should will capture the average treatment on the treated, which in our example is $-5p.p$.
$$ \underbrace{\Big(\mathbb{E}[Y_{i,t}\vert D_i=1] - \mathbb{E}[Y_{i,t}\vert D_i=0]\Big)}{5p.p} - \underbrace{\Big(\mathbb{E}[Y{i,t}\vert D_i=1] - \mathbb{E}[Y_{i,t}\vert D_i=0]\Big)}_{10p.p} $$