About that Roland Fryer study and conceptual-level differences in statistical probabilities

(I swear I have corrected and corrected this post, darn it, and I keep finding typos and skipped words. Sorry.)

Roland Fryer, Jr. is a brilliant economist–I’ve always enjoyed reading his work on education, and thus when he produced a study on police shootings, the combination of Roland Fryer/Harvard/New York Times coverage has resulted in a ton of press for it. Here is the paper at NBER. Here is the original NYT piece, which I thought did a nice job writing up the study. It’s super irritating to me that what people have highlighted about the study is that he finds no statistically significant differences in shooting deaths between white and black suspects. For some reason, THAT is getting the headlines. But he finds disparities in _every_other_aspect of police treatment.

Taser use (ow) and rough treatment consistently show disparities. These conclusions are drawn from Stop and Frisk data from NYC and the Police Contact Survey (national data). The data on officer-involved shootings come from data solicited by the author from Boston, Camden, NYC, Philadelphia, Austin, Dallas, Houston, Los Angeles, six Florida counties, and Tacoma, Washington.

There is a very detailed discussion of their data collection process from police narratives, where they coded and back-coded nearly 200 variables from these cities. They then do a separate set of codings on Houston, and I’m not sure why, other than what Fryer reports: the Houston data has more detail than the others. I guess the differences in the data were enough to make Fryer think they might find something different in Houston than from the other 10 cities, so they analyzed them separately. I probably wouldn’t have done that; I probably would have kept the coding the same for all the cities and simply had empty cells for concepts missing in the other cities. It’s not clear, to me anyway, what he gets out of the second coding around Houston.

Like any good economist, he beats on the data pretty hard; he does robustness check after robustness check and finds really no evidence in the data that in individual interactions with police, there is a difference by race or ethnicity in the odds that deadly force will be used.

Now, that’s an interesting and important finding, but it’s limited, and people are not listening overmuch to Fryer as he points this out. Fryer’s data are used to model an interaction game among individuals. He’s not able to answer some of the questions that BLM has raised. There is a substantive difference between these two statistical propositions:

1) that, when a policeman has encountered an individual, they use deadly force. This is modeled as an odds ratio that examines the difference by officer demographics, some context variables and the race of the suspect. (if f is force and e is an encounter, we have the posterior probability (P(f|e))

2) that a police encounters an individual and then uses deadly force: the union of two probabilities (I’m too lazy to present the formula as it’s not straight up on my keyboard. Maths types who care about such things know what I am talking about anyway.)

It’s the second he doesn’t have, and that’s important. The first can tell us whether or not, in the statistical sense, individual policemen make racist choices when they have encounters with suspects in various situations. You can envision Fryer’s data as he does: as a series of conditional probabilities that begin to unfold at e. That’s a good thing to know. Whether an individual officer is a member of the Aryan Nation or not–that is, whether the individual police officer is explicitly racist and making explicitly racist choices in individual interactions–does not seem to be moving Fryer’s findings. (It still makes such an individual officer somebody I really, truly do not want having state-sanctioned capability to use deadly force, but the “bad apple making bad choices” idea does not seem to be driving the numbers).

Fryer does not really have P(e)–but his precinct data are suggestive–and that’s a problem. He discusses it over and over in the paper, and then again in his discussion with readers in this very nice NYT follow up. Disproportionality–the idea that relative to their population percentages, African Americans are disproportionately represented in police encounters/arrests/violence–could enter into the probability in proposition #1 at either point (e) or (f), and with out (e), we can’t use Fryer’s study except as a partial answer to BLM critiques of US policing. What we can conclude from Fryer’s study is that the disproportionality in the aggregate statistics are not likely due to P(f).* And that’s important–it’s way more than I’ve accomplished lately.

But anyway:

Fryer argues in the follow up that we should be able to understand whether P(e) is an issue somewhat in instances where police are called to a specific situation. I think that’s a good argument, but not a great one, because I don’t think we can treat race as exogenous in police calls or in police responses to calls. Who gets called on, what types of behaviors prompts calls, how quickly police are able to access the scene of the report (and thus, encounter a suspect), etc–those are all factors where race and place may factor into whether there is a suspect encountered. For instance, one reason his rates on deadly force use among whites may be relatively high compared to those of African Americans might be that white behaviors have to be extreme in some way before the police are called in the first place, and that extremeness, or interpretations of it–could prompt use of deadly force once police arrive. Police are likely to cluster geographically, and so is crime, and so are background populations–race and ethnicity are not geographically random.

* Well, back up. We can’t use one social science study, no matter how good (and this is a good study), as the answer. Social science evidence has to accrue across many, many high-quality studies before we should start deciding we know what’s going on. Here’s another good study that finds significant bias, but the data are aggregate.