In This Set of Notes We’re Going to
Most undergraduate courses on statistics focus on probability mass functions and probability densities functions for obvious reasons. These are functions that act on individual data points, so we can plot them, add them, etc.
For instance, if the variable is discrete like the number of months between when a tenant violated the lease and a landlord files an eviction, the probability mass function can be used to express the probability associated with each month.
The Probability Mass Function
$$ \mathbb{p}_X(x) \in [0, 1] $$
Or if the variable is continuous, like the distance a household moves following an eviction filling, the probability density function can be used to express the density associated with each point.
The Probability Density Function
$$ \int f(x)dx = 1, f(x) \geq0 $$
In Economic empirical work, though, you have to make decisions. You have to make judgements. You have to evaluate tradeoffs. And so to really understand what’s going on, we want to get more fundamental than probability mass functions and probability density functions. We want to talk about Probability Measures.
A probability measure is arguably a difficult concept to grasp initially. It’s a function, yes. But it doesn’t act (that is take as an input) on individual data points. Instead, it acts — or assigns probability — to collection of data points — what we refer to as subsets.
In every Economic paper, there are two probability measures of interest. The first is defined on the population — over the set of people or firms that you care about. Let’s say that you’re interested in studying an eviction prevention policy. Then the set of people that you care about is all the people facing an eviction. A subset of interest would be a collection of people facing an eviction with some shared attribute. For instance, one subset are tenants with a verbal lease. A probability measure, $\mathbb{P}$, can tell us how likely a given tenant in the population is to have a verbal lease.
The second probability space of interest in any paper is the Sampling Probability Space. Here, the sample space consists of all possible datasets.