Attention conservation notice: 2,000 words on why “impact” is over-rated. Of course, bean-counting deans and administrators think otherwise. While the media cavorts over “an Excel error”, I want to talk about unconventional weighting and cherry picking the data, cherry picking papers to treat as definitive, and why working at a policy school causes me despair. To quote Hemingway: “There are some things which cannot be learned quickly and time, which is all we have, must be paid heavily for their acquiring.”
Paul Krugman’s first of many posts on the topic gives a nice explanation of the Rogoff-Reinhart dealio–there is much that drives me crazy about Paul Krugman, but you can’t complain that he doesn’t know how to explain economics to a broad audience–because he most certainly does. The deal goes something like this: There are two recent macro papers that have purported to provide the empirical basis for tax cuts to produce growth. One is by Alberto Alesina and Silvia Ardagna:
Large Changes in Fiscal Policy: Taxes versus Spending, Alberto Alesina, Silvia Ardagna, in Tax Policy and the Economy, Volume 24 (2010), The University of Chicago Press
This paper is in a much less influential journal than the one that’s causing all the kerfuffle, which is this one:
Rogoff, Kenneth, and Carmen Reinhart. (2010) “Growth in a Time of Debt.” American Economic Review 100.2: 573–78
American Economic Review is the gold standard of econ journals. Mike Konczal at the Next New Deal blog summarizes the paper:
In 2010, economists Carmen Reinhart and Kenneth Rogoff released a paper, “Growth in a Time of Debt.” Their “main result is that…median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.” Countries with debt-to-GDP ratios above 90 percent have a slightly negative average growth rate, in fact.
This has been one of the most cited stats in the public debate during the Great Recession. Paul Ryan’s Path to Prosperity budget states their study “found conclusive empirical evidence that [debt] exceeding 90 percent of the economy has a significant negative effect on economic growth.” The Washington Post editorial board takes it as an economic consensus view, stating that “debt-to-GDP could keep rising — and stick dangerously near the 90 percent mark that economists regard as a threat to sustainable economic growth.”
Oh, yeah. That’s what economists say. All of the smart ones, right. Reinhart and Rogoff’s paper *from 2010* is so scientifical in the minds of WashPo editors that it’s now economic consensus. Mmmmmkay.
But that’s not how social science works. It takes time, and a lot of subsequent study, to find a result we should treat as definitive. But that isn’t what politicians or the public want to hear. And…it’s so very, very tempting to give the people what they want. It’s one way you get to the Kennedy School.
Well, what’s wrong with that? A great deal, it turns out, both in terms of the original paper’s content, methods, and conclusions. The story becomes ugly pretty fast–though not surprising to those of us who watch influence peddling/pandering happen all day every day in the policy analysis machine of academic life, in which Harvard is to the academy what Google is to search engines–there is only one in the minds of most people; most people are too lazy to use more than one search engine, and why would you when that one gives you what you want with so little effort?
After quite some nagging, apparently, Thomas Herndon (a PhD student in Econ), Michael Ash, and Robert Pollin, all researchers at the University of Massachusetts Amherst finally got Rogoff and Reinhart to share their data after trying to replicate the results unsuccessfully with data compiled themselves. When the UM researchers tried to replicate the findings with Rogoff and Reinhart’s numbers, they discovered a systematic coding error that, when corrected, shows the original conclusion–that tax cuts were expansionary during times of debt–was simply not supported by the extant data or the subsequent analysis.
Which makes me wonder about the AER as the gold standard. First, I thought you always had to share your data to get into AER, and I thought reviewers were supplied WITH YOUR DATA at the time they review. I’ve had to do that for some of the journals I’ve published in. That’s what a gold standard looks like to me. Am I missing a part of the story here?
The media, of course, is eating this up, largely because there is a delicious David versus Goliath aspect to the review and the chance that Hahhhhhvard folks might be wrong and a wee graduate student right. I strongly suspect that if this were an assistant professor at Princeton the finding would have been largely ignored in media because it would seem like academic in-fighting instead of the sexy, aw-shucks, disempowered-grad-student-makes-good story it is. I’m waiting for the next iteration of the story–or the Hollywood version–about how some meanypants proffie tried to steal credit for this brilliant result, but young economics stud pulled out an AK-47 during a research meeting while his faithful, brilliant-but-not-as-brilliant-as-he-is girl leans on his masculine shoulder.
I’m sounding a little bitter, which I am actually not, about the review and the success it bestowed upon a graduate student. The attention is a good thing, and it’s wonderful to see a young person do a replication study and get so much impact out of it–usually, replication studies are treated with less respect than they deserve. Again, this is a problem with the academy. Why do careful replication studies if the point is to be out there chasing your own Freakonomics/WOWEEZOWEE LOOKYHERE moment. But I am annoyed at the way the whole thing is being has been discussed in the media, as though this review strikes down the whole hypothesis that tax cuts might foster growth when government indebtedness is at stake.
It doesn’t. There is another paper out there, for one thing, and for another: did I not just say that social science doesn’t work like that? Yes, there are seminal papers, but it takes a long time to get to the point where we can truly call something ‘seminal.’ As usual, Richard Green and Mark Thoma have the real deal analytical problems of sussing out this question. Richard Green is here in Forbes, discussing the particulars of this particular, thorny, empirical question. Mark Thoma has well-reasoned insights about the larger problems in macro over at Economist’s View.
Krugman is careful to point out that you can’t conflate the problems with the AER paper with the subsequent, high-profile book: This Time is Different: Eight Centuries of Financial Folly.
But I kind of can–and here’s why. For all the media froth about the coding error, which is pretty bad when we are talking AER level, there two other issues raised in the Herndon-Ash-Pollin study that are straight up signs of analysis-fiddling to get the results you want. What are they? Michael Konczal explains the issues way better than I can:
Selective Exclusions. Reinhart-Rogoff use 1946-2009 as their period, with the main difference among countries being their starting year. In their data set, there are 110 years of data available for countries that have a debt/GDP over 90 percent, but they only use 96 of those years. The paper didn’t disclose which years they excluded or why. [Emphasis mine: WTH AER????]
Herndon-Ash-Pollin find that they exclude Australia (1946-1950), New Zealand (1946-1949), and Canada (1946-1950). This has consequences, as these countries have high-debt and solid growth. Canada had debt-to-GDP over 90 percent during this period and 3 percent growth. New Zealand had a debt/GDP over 90 percent from 1946-1951. If you use the average growth rate across all those years it is 2.58 percent. If you only use the last year, as Reinhart-Rogoff does, it has a growth rate of -7.6 percent. That’s a big difference, especially considering how they weigh the countries.
Unconventional Weighting. Reinhart-Rogoff divides country years into debt-to-GDP buckets. They then take the average real growth for each country within the buckets. So the growth rate of the 19 years that England is above 90 percent debt-to-GDP are averaged into one number. These country numbers are then averaged, equally by country, to calculate the average real GDP growth weight.
In case that didn’t make sense let’s look at an example. England has 19 years (1946-1964) above 90 percent debt-to-GDP with an average 2.4 percent growth rate. New Zealand has one year in their sample above 90 percent debt-to-GDP with a growth rate of -7.6. These two numbers, 2.4 and -7.6 percent, are given equal weight in the final calculation, as they average the countries equally. Even though there are 19 times as many data points for England.
Now maybe you don’t want to give equal weighting to years (technical aside: Herndon-Ash-Pollin bring up serial correlation as a possibility). Perhaps you want to take episodes. But this weighting significantly reduces the average; if you weight by the number of years you find a higher growth rate above 90 percent. Reinhart-Rogoff don’t discuss this methodology, either the fact that they are weighing this way or the justification for it, in their paper [Again, emphasis mine, and again WTH AER????]
Keep in mind that every_single_day at USC I have AER shoved in my face as the holiest of all that is holy when it comes to scholarly rigor, and HOLY SCREAMING MEEMIES, bunnypants, there are three big, honking things here that should have come out in peer review. First, how do you get away with not disclosing which countries you are leaving out? And second: how do you get away with not explaining your weighting? And why didn’t anybody demand to see the consequences of these major analytical choices in robustness checks? These are not excel errors. These are not esoteric things that only economists can understand. These are basics of modeling research.Austin Frakt over at Washington Monthly offers up some rationales for why all this happened–as a function of the risks associated with public intellectualism–which…I just honestly don’t buy:
It’s also true that some (not all) journalists tend to promote risk. They push for a clear position. They might deny this, but I’m on the other end of the line, and I know what I’m being pressured or cajoled into saying. If one wants to stay (even appropriately) nuanced, it’s hard. The safest thing to do is to shut up.
And then what? Your work, or the good work of the community, is either promoted by others — perhaps in ways you don’t agree with — or not promoted at all. And what is the point of work that receives no notice?
There’s no easy answer here. The best I can say is it’s a continuum. You can, to some extent, choose how much risk you take.
Um, there ARE TOO easy answers here: Do the best work you can as carefully as you can, and don’t diddle your analysis to come up with OMIGOD THIS IS A BIG DEAL HEADLINER findings…if your findings aren’t all that. If the problem were just a coding error, I’d be willing to wave my hands and say “that’s the risk of being a human being.” But it’s not just a coding error; there are big, undefended analytical choices that skew the results quite a bit. This is an indictment of the prestige economic dripping in the fame hierarchy of economics in peer review.
Here’s the deal: If two professors from Appalachia State had handed in that paper without robustness checks and with no explanation for major analytical choices, I doubt they would have gotten the paper *reviewed* at AER, let alone THROUGH review at AER. But no, this was Harvard, with one person on the editorial board at AER. Rogoff and Reinhart have their own little publicity machines–read their Wikipedia entry or their home pages for some cringe-worthy highlights of self-written biographical material–these are the type of ambitious, self-promotional folks who wind up at Harvard and–like Google–they are shorthand for “excellence” when what they are really are is shorthand for “elite” in multiple dimensions, only one of which is “scholarly excellence.” They are insiders, and the rules are different for insiders as we know from every context of social life.
This is not to say they are poor scholars–I’m not qualified to judge that, as this is not my field–but the problem boils down to a) the notion that there are easy answers out there in policy-ville and that b), instead of spending years doing careful work carefully discussed, your job as a policy scholar is to “make a splash” and “have impact.” As long as that’s the incentive and there are exemptions granted to you because of your status, this stuff is going to happen. It’s not “risk”; it’s inevitability. Peer review is supposed to act as a check on this kind of thing, and it clearly failed to do so here.
And won’t it be a wonderful day when you are taught by recording of a “star” professor via your MOOC? After all, you only need one big, bad, high-prestige voice to define what is and what isn’t in the world, right?