The dangers of data journalism

The World Bank: “Our mission is to fight poverty with passion and professionalism for lasting results” it says. It also provides shelter from the rain for that little panda.

My previous post on data journalism might have conveyed the impression that I think it will cure all the problems of the press-release-rewriting style of journalism that readers of the Metro, for example, experience. Following several emails, I think I need to clarify.

I praised BBC Radio’s More or Less, but Matt Berkley emailed to criticise the programme’s feature on the World Bank’s global poverty stats, which he thinks “misleads in several important aspects”. Matt’s comment interested me (not least because I have, in another life, done some research on global poverty statistics), so I had another look. Feel free to read his complaint to the BBC and compare it to the published story, or the podcast.

Data doesn’t remove the room for debate, it just shifts the debate on to different territory. A data journalist will still make value judgements – but those should, where possible, be informed by statistical analysis, not an appeal to authority.

Now, attempting to report world poverty in a newspaper article sets the bar extremely high: even the meaning of the word “poverty” is a value judgement.

We can do better than “world poverty is decreasing because the World Bank says it is”, which is a simple appeal to authority: those guys are the experts, so they must be correct.

Given the world Bank report, journalists may ask:

  • Why we pick a certain income level to indicate poverty? Even if we accept that far fewer people now live on $1.25 or less, there are almost as many people surviving on $2 or less as there were before. The poverty line may be defined as not starving, or not having some defined “basic needs” met, or not being among the poorest 20 per cent in your country. These are all different numbers, and all used by economists. Note: you can’t eradicate the last type of poverty, in case you were wondering.
  • Whether we correct an arbitrary poverty line for the relative price of the things that poor people buy in different countries (also, how do we decide what those things are? The poor in different countries eat different food, and have different habits, which may make some parts of the world seem richer, when the quality of life is no better).
  • Do we use a measure of earned income, or of what those people can eat or trade? The urban poor may have a bit more cash than the rural poor, but don’t have domestic animals, for example, so they might spend more but eat less. This is very difficult to measure.
  • Most seriously, do the statistics use data to manipulate the headline? If you have done the rest of the analysis, this becomes clearer. Governments (or World Banks) are sometimes accused of picking a threshold, or a measurement process, to suit a carefully-chosen good news agenda.

An example of the final point: the government of Cynicalia wants to claim that it has abolished poverty, with the poverty line defined as $1.25 a day (as the World Bank defines it). There are a million working class Cynicalians earning on average $1 a day, and a million middle class Cynicalians earning on average $3 a day, and the president and his family earn $100,000 a day. It might squeeze the middle so that there are two million people earning $2 a day, while not redistributing the president’s wealth at all which is hidden in Switzerland. The government can now send a press release claiming that no one is poor, and that more than half the country is as well off, or better off, than before the reform.

A journalist can check the numbers of poor people at different poverty lines (maybe even using different measurements of income), investigate how the poverty line is calculated, or examine the effect of different redistribution policies. The figures exist, though working out how they were calculated can be a headache. All this takes time and some expertise, which is a problem.

Or the newspaper can just give up, and tell the journalist to repeat the government’s claim that Poverty is History. In which case that journalist is a loyal Cynic.

The article that Matt criticises covers many of the assumptions on poverty lines in some detail, and highlights their shortcomings. He feels the BBC should have done better.

I don’t agree with most of Matt’s complaint, for two editorial reasons. The first is that, where assumptions are made, I think they are clearly and accurately spelt out. The second is that this feature does not attempt to support a conclusion, merely to investigate how we calculate it (I also disagree with his analysis for a couple of economic reasons, but this is not the forum to air that discussion).

Data journalism is becoming trendy. I wish I’d written about Nate Silver in 2008, before I looked like a bandwagon jumper. But here’s the point: statistics do not resolve all arguments. A data journalist needs to understand how the data was collected, how it is presented, and whether the conclusions are justified by the data. The journalist also needs to resist overclaiming, based on a the emotional appeal of what the data seems to say.

I can show you plenty of examples of bad data journalism, where a little understanding can be as bad as none at all: I’ll leave it to you to ask.

Nate Silver’s numbers game

Nate Silver: hard to believe this man is a statistician

For the second US election in a row, the winner is a guy called Nate Silver, who might be the future of intelligent journalism. He rescues us from the tyranny of columnists who simply write about the comments of their own heads.

Nate blogs at, which is, since 2010, a New York Times blog. He analyses opinion polls, but he does it very, very well. He is entertaining and readable, even if you don’t care who just won the election in the US.

I discovered Nate’s analysis by accident in 2008 when I was looking for some statistics to undermine one of the nuttier blog opinions by data-lite controversialist Melanie Phillips (which made it very nutty indeed). Fivethirtyeight has a rigour that journalism seems to have mislaid in the internet era in a search for sensation. He does a seemingly simple thing extremely well: when an opinion poll is released, he adds it to a model which creates an aggregate. If the model is well-constructed, this has smaller margins for error and less chance of systematic bias. It is more likely to reflect the true state of the world.

The clever part is that he doesn’t just produce an average. He weights the polls, depending on their sample size, the way the information was obtained, the historical accuracy of the polling company, when it was conducted, the exact question that was asked, and so on. He looks for statistical bias – a consistent under- or over-reporting of candidate’s popularity. He adjusts his own model if he finds evidence that it is biased. Importantly, he writes nerdy blog posts about what he is doing, explaining his reasoning, and pointing out possible flaws in his work.

The result is that “outliers” – polls that, through random sampling, produce a freak result – have little importance on Fivethirtyeight – while on the internet and the news channels they tend to dominate the agenda, albeit fleetingly. This means his reporting is less shouty, but it has proved to be stunningly accurate for two elections in a row: at the time of writing, his analysis has correctly predicted the result in every state for the 2012 US presidential election, and the electoral college vote too.

Having a model doesn’t necessarily mean you will be correct – there are plenty of other statistical models which predicted the election less accurately. Fivethirtyeight carefully spells out the steps in its analytical process (though not the precise parameters of the model), so we can make an informed judgement on the quality of the findings. Any model is open to criticism from other statisticians – but this means they can have an adult, public conversation about what might be improved, or what the impact of a flaw in the analysis might be. We can learn from this, too.

This wouldn’t be important if it was just a different way to present the same news; but this type of analysis creates fresh insight. By polling day 2012, the model predicted a greater than 90 per cent chance of an Obama victory; and yet organisations like the BBC and the FT were using lazy phrases like “too close to call” and “on a knife edge”. If newspapers are prepared to do this type of analysis routinely, I suggest, it offers huge potential for creating an open, analytical type of serious journalism led by numbers and observed reality, not opinions.

Old jokes department: “And what do you do?”

Not every journalist can be a stats geek, though I think they should have more compulsory education in how to interpret data, and would prefer that newspapers enforced an in-house ban on reporting surveys that are statistical nonsense – which, in my experience, is most of them (I’ve written those survey-based articles in the past, and reported lots of rubbish data as if it were spotless, which I regret).

Newspapers and magazines are cutting back on conventional journalism. Budgets are tight. It’s probably too much to hope that we can create a new type of data-journalist, or that newspapers will suddenly grow a statistical conscience. It needn’t be expensive: a laptop and some specialist software is perfectly adequate to do the statistical research that can validate the claims that powerful people make. It’s the job of the media to investigate these claims – not just talk to one person who agrees, and another who disagrees. On Radio 4, More or Less does an entertaining job of validating reported statistics (download the podcasts, they are excellent). Ben Goldacre’s Bad Science posts are also a model of this approach.

It’s patronising to assume that readers can’t cope with statistical analysis. Clearly, many don’t like it, and some misunderstand it; but that’s true of any type of journalism that goes beyond the obvious. The conclusions (especially those that go against gut feel or conventional wisdom) may be unpopular: just read the critical comments on Nate Silver’s blog. It’s also true that science isn’t the last word on a subject, just a powerful way of testing an assumption. Statistics involves making value judgements in how you treat the numbers, in the same way as a journalist makes a judgement about how much credibility to give any source. But in statistics there is the opportunity to be explicit about those judgements, and then go where the numbers take us.

This type of insight is a fundamental tool, in an increasingly complex world, if we want to make informed decisions. The alternative is to just place trust in the conclusions of “experts”, of which there seem to be an ever-increasing number quoted on TV or in newspapers.

I’ll leave the conclusion to one of Nate’s commenters, who explains it better than I do:

Rather than cheer for Nate because we all like his Obama forecasts, how about cheering for him because he might believe in a world where numbers and rational analysis are vital to how we make decisions, even in those cases where we don’t like what the numbers imply?… It’s not about hoping you will win at Vegas. It’s about understanding how the Vegas game works.

For one day only

The budget cuts in local radio were starting to bite

6pm update: here’s a link to my interview on Ireland’s Newstalk. It starts about 35 minutes into this stream, but I thoroughly recommend the 10 minutes that precede my segment too: it’s an interview with the inventor of a special bag that you pee into when you can’t find a toilet. It was easily the most amusing discussion of urine that I’ve listened to before the watershed.

Today (Friday), any of you who are pretending to work from home might get the bonus of Talk Normal on your radio. That really is me! Or, if you’ve come here because you just heard me on the radio, that really was me!

Or, indeed, if you’re planning your Friday and are wondering what to do until happy hour, then residents of Coventry & Warwickshire, Leeds, Cumbria, Belfast, Antrim, Omagh, Kent, Stockport (and Congleton) and Norfolk (and some more) should tune in to local radio. During the day I’ll be chatting to all of you about a survey of jargon.

Here’s an article in the Mail online about the jargon survey. I’m quoted near the bottom. My quote about communicating in a way that people could understand was, when I checked, next to the headline “Precocious Honey Boo Boo stumbles over Spanish… and bursts into tears as ‘pageant good luck charm’ pet pig Glitzy is sent back”, which I can only assume was written by a duck pointing its bill at random words in an old copy of Hello!.

This is, of course, also how new copies of Hello! are written. It takes a lot of ducks but, crucially, not many journalists.

For first-time Talknormalists: now you're here, have a look around. Find out my views about penguins on conference calls or discover my intimate connection to Katie Price's breasts. Residents of South Ribble, I know your secrets.


Nearly famous now

We’re all winners here at Talk Normal, but today I’m a tiny bit more of a winner than you are.

I haven’t actually won anything, you understand. Yet.

A very pleasant person from the Plain English Campaign told me that Talk Normal has been nominated for a Plain English Champion award.

It’s not the first time I have earned a nomination on merit, of course: in 1990 I was nominated for redundancy.

I don’t find out out if I’m a winner until the end of the year but, in the proud tradition of companies who haven’t actually won but don’t want you to notice, I intend to squeeze this particular orange for all the juice I can get. Maybe I will leverage my reputation enhancement strategy by putting news of this not-quite-award in a giant email signature, with the word “nominated” in tiny tiny tiny yellow type.

Meanwhile, put your weight behind the Plain English Campaign, not least because it invented the name ploddledygook for police jargon.

Cliché-ridden rubbish

For insomniacs, rugby fans and those of us known as morning people, listening to ITV’s World Cup Rugby commentator Phil Vickery is a buttock-clenching lesson in Talknormalism. Obviously uncomfortable in his new job, he flips between the banal (“he uses his feet to run”), strained silence (“If the art of commentary is silence, Vickery is its Rembrandt” – The Daily Telegraph), and waffly overtalking (in Vickery’s commentary you don’t play rugby, you “get a game of rugby under your belt”).

Online rugby fans are not content (“truly, truly awful”, “needs to be cattle-prodded”, “master of platitudes”, “cliché-ridden rubbish”, “the worst commentator in the history of sport”, and that’s just the kinder ones). Searching Twitter for the word “Vickery” during an England rugby game is more entertaining than watching the team play.

It can’t be easy to be a commentator but, on the other hand, it’s his job. We might expect a certain level of expertise. This is a surprisingly common problem in British televised sport, where often the guy in the second seat seems to be doing it for a bet. In the last football World Cup Chris Coleman also gave the impression that he was just filling in until the real commentator’s taxi showed up, stringing together every football cliche from “he’s got good touch for a big man” to “you often you see a team concede soon after scoring a goal”, delivered when we’d just seen a team concede soon after scoring a goal.

Expertise in doing something does not guarantee expertise in explaining it to others, but that expertise can be trained, developed, measured and rewarded. This doesn’t just apply to sport.

Last week I spoke at a conference of the Chartered Management Institute to encourage more slavish obedience to my borderline fanaticism. A member of the audience asked how managers should solve the Vickery-Coleman communication problems in their companies. I suggested they start by formally assessing how well those managers speak and write, with compulsory training for the ones who don’t do it well, and rewards for the ones who do. Anecdotally I find that, when I work to help companies with a waffle problem, junior staff are often just copying a Vickery-Coleman boss. We catch waffle from each other like we catch a nasty cold.

Waffle infection could explain the moment towards the end of this weekend’s game when Nick Mullins, Vickery’s co-commentator, informed us that England full back Ben Foden “always has his eyes open, and is always ready to pin back his ears.” I think I remember that torture scene from one of the Saw films:

Maybe Mullins caught a nasty case of platitudes from Vickery. Although, thinking about it, he has always been rubbish too.

Week 39: sell joy, buy gloom

Seeing as the Western economies are all going to hell in a handcart by the end of 2011, I thought I’d take a look and see how much residual optimism is left.

To do this, I constructed the TN Joy Index, by taking the numbers of articles that mention the word “joy”, and dividing them by the number that mentioned the word “gloom”. In this case, I’m showing the results from news sources in the US (omitting sport, where both emotions are cheapened commodities, and obituaries, which might skew the data). I figured the US is the bellwether economy for joy. It is still the world’s largest manufacturer and exporter of optimism, though not all of it is of the highest quality these days. for example, only the US could have produced the following three books, demonstrating how competitive the market in misplaced optimism used to be:

They'll be correct, just not yet

For would-be students of the TN Joy Index, I present three results. The first is that newspapers are still, on balance, happy places. Not one of my results contained a month where there were more articles mentioning gloom than joy. I wouldn’t go so far as to recommend a newspaper to cheer yourself up at the moment, unless your personal Joy Index is low indeed. If that is the case, buy the official Talk Normal book instead. That’ll make at least one of us happy.

The second result is that, despite a lack of concrete reasons to be cheerful, the US has been steadily recovering the joy it abruptly lost in 2007 and 2008. In 2011 joy has been up to almost pre-crash levels of exuberance. I suspect that joy is more in evidence among high earners. Still, if you’re unemployed or in foreclosure, look at this and you might be encouraged:

Not for long though. The Weekly TN Joy Index is plunging like an overworked plumber. In week 32*, beginning 8 August, we reached historically low levels of joyfulness, with only 1.62 joys for every gloom. For comparison: in the week that Lehman Brothers collapsed in the US, the index was at 1.97. In the week after 9/11, it was at 2.24.

The short-term market for optimism seems to have collapsed, but I refuse to be downhearted. I may write a book called “Joy at 100,000!!!” predicting a time when gloom is all but forgotten and a sub-2 index seems unthinkable. It’s about as likely to happen in the near future as the Dow at 40,000 – but, when the market turns, there will be money in unrealistic optimism once again. I want my cut.

* TN TruFact: Since 15 June 1988, there has been an International Standard for week numbering, to give management consultants something to report on when they visit wall chart manufacturers. It is defined in ISO-8601 and, according to Epoch Converter, “The first week of the year is the week that contains that year’s first Thursday.” If that doesn’t restore your faith in the ability of developed economies to create jobs out of thin air, nothing will.

The future of broadcast news

Following a positive reaction to my proposal to reclassify many apparent crises as palavers, or even kerfuffles, I have taken a few minutes to blue-sky the Kerfufflometer, which I believe will add to the Talknormalist content of broadcast news. I know that many influential broadcasters are avid readers of this blog. You know where to find me.

Cut out your waffle: buy my book

