Textual Average Treatment Effects

References

Active Questions

[ ] The relationship between the messages and the chat style
[ ]

Motivation

There seems to be a fundamental tradeoff in causal inference between models which are flexible and models which are interpretable (although perhaps generalize is better word than flexible). To see this tradeoff, consider estimating the average treatment effect using a linear model versus a fine-tuned large language model. Thinking about linear regression in terms of the residual variation, we have a sense of what its capturing. For the large language model, though, we don’t understand the structure/ inductive biases that the model is relying on for the prediction (see this note for some background on LLMs).

Linear Model

$$ y_i = \beta d_i + \theta ^Tx_i $$

LLM

Notation: In this note, we’re going to think of an LLM as a function which takes as arguments a prompt and a textual description related to the individual observation. When we’re learning the weights of the LLM, we’ll make this explicit by parameterizing the function with a subscript $\theta$.

Screenshot 2025-08-09 at 8.05.11 PM.png

Screenshot 2025-08-09 at 8.05.34 PM.png

So the question is, to what extent can we break this divide? Can we introduce estimation techniques which can generalize and are more interpretable than current benchmarks?

Approach

To introduce our approach, let’s consider the typical framing of an estimation problem. The objective function is usually a combination of the some measure of how well the model fits the data plus a penalty term.

$$ \underset{\theta}{\text{minimize}} \ \overset{\text{Empirical Loss}}{f(\theta)} + \overset{\text{Penalty term Loss}}{\lambda g(\theta)} $$

In our approach, the choice parameter, $\theta$, are the weights of a large language model. For example, the open weights in OpenAI’s gtp-oss models.