Why is society the way it is? The problem of infinity DAGs

Consider two questions:
1. Does racial bias in law enforcement in the U.S. occur? (assuming the answer is yes)…
2. Why does racial bias in law enforcement in the U.S. occur?

Ezra Klein wrote a piece in December on the danger of controlling for large numbers of variables in analysis because we could end up ‘controlling out’ key parts of our effect of interest (no, he doesn’t appear to be an epidemiologist or even any type of researcher, but nevertheless seems to have a better understanding of confounding than many with those titles).

In DAG language:
healthyinferenceIsvWhy_jpeg

As he aptly recognizes, researchers know this. As we’ve been taught, there are also ways of dealing with it: base your variables on substantive knowledge, do not adjust for mediators, and if you do adjust for mediators, know what you’re doing [see: mediation analysis].

Klein’s problem with over-controlling is philosophically grounded in question number 2 above. He suggests that controlling for effects of the exposure prevents us from knowing why phenomenon occur. Once you control for location of drug use, black people end up far less likely to be arrested for drug crimes than white people. This is because they are more likely to use drugs in urban settings, and police are more likely to make arrests there. So in controlling for location, we lose the ‘why.’

But there is a distinction between question 1 & 2.  The first is complicated enough: teasing out whether an association exists and its strength is undoubtedly ‘epidemiology.’ It’s also quantitative ‘sociology’ with some ‘economics,’ and probably any science. It involves describing the world as it is.

Consider the (highly simplified) reality of why people who are black are potentially more likely to be arrested (Y= e.g. Arrest):

healthyinfWhy2

There are lot of cumulative, intertwined reasons ‘why’ racial bias might exist in U.S. law enforcement.  The particular letter (i.e. variable) we choose to study is somewhat arbitrary (A on Y? B on Y? C on Y?…).  Say we look at the effect of C on Y.  There are ancestors (A and B…) and mediating effects (D and E….).  Such is the case no matter where our study sits on the causal path.  In other words, there are infinity letters behind and ahead of our letter of choice.

Figuring out why society is the way it is is entirely relative. When viewed as ‘yes’ or ‘no,’ (‘Have you ever been target of racism?’) we can measure these things. But the cut-point is arbitrary.  When viewed as a cumulative sum of experiences, the DAG possibilities approach infinity; the ‘why’ is less and less measurable (‘So, what’s like to be Black in America?’).

As Klein suggests, we shouldn’t over-control or adjust for mediators. But perhaps the problem with this is more to do with biasing our analysis away from some true effect (i.e. the effect of the letter we arbitrarily choose to study) than Ezra Klein’s suggestion that it prevents us from knowing why. Are epidemiological studies of social phenomenon meant to answer ‘why’? Can they? The rigour in their methods comes from their ability to figure out what is. We know black people are more frequently arrested for similar crimes to white people. Why? We can only adjust so much, calculate the effect along so many possible pathways, and collect so much data. And we probably still wouldn’t know.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s