The dangers of data journalism

The World Bank: “Our mission is to fight poverty with passion and professionalism for lasting results” it says. It also provides shelter from the rain for that little panda.

My previous post on data journalism might have conveyed the impression that I think it will cure all the problems of the press-release-rewriting style of journalism that readers of the Metro, for example, experience. Following several emails, I think I need to clarify.

I praised BBC Radio’s More or Less, but Matt Berkley emailed to criticise the programme’s feature on the World Bank’s global poverty stats, which he thinks “misleads in several important aspects”. Matt’s comment interested me (not least because I have, in another life, done some research on global poverty statistics), so I had another look. Feel free to read his complaint to the BBC and compare it to the published story, or the podcast.

Data doesn’t remove the room for debate, it just shifts the debate on to different territory. A data journalist will still make value judgements – but those should, where possible, be informed by statistical analysis, not an appeal to authority.

Now, attempting to report world poverty in a newspaper article sets the bar extremely high: even the meaning of the word “poverty” is a value judgement.

We can do better than “world poverty is decreasing because the World Bank says it is”, which is a simple appeal to authority: those guys are the experts, so they must be correct.

Given the world Bank report, journalists may ask:

  • Why we pick a certain income level to indicate poverty? Even if we accept that far fewer people now live on $1.25 or less, there are almost as many people surviving on $2 or less as there were before. The poverty line may be defined as not starving, or not having some defined “basic needs” met, or not being among the poorest 20 per cent in your country. These are all different numbers, and all used by economists. Note: you can’t eradicate the last type of poverty, in case you were wondering.
  • Whether we correct an arbitrary poverty line for the relative price of the things that poor people buy in different countries (also, how do we decide what those things are? The poor in different countries eat different food, and have different habits, which may make some parts of the world seem richer, when the quality of life is no better).
  • Do we use a measure of earned income, or of what those people can eat or trade? The urban poor may have a bit more cash than the rural poor, but don’t have domestic animals, for example, so they might spend more but eat less. This is very difficult to measure.
  • Most seriously, do the statistics use data to manipulate the headline? If you have done the rest of the analysis, this becomes clearer. Governments (or World Banks) are sometimes accused of picking a threshold, or a measurement process, to suit a carefully-chosen good news agenda.

An example of the final point: the government of Cynicalia wants to claim that it has abolished poverty, with the poverty line defined as $1.25 a day (as the World Bank defines it). There are a million working class Cynicalians earning on average $1 a day, and a million middle class Cynicalians earning on average $3 a day, and the president and his family earn $100,000 a day. It might squeeze the middle so that there are two million people earning $2 a day, while not redistributing the president’s wealth at all which is hidden in Switzerland. The government can now send a press release claiming that no one is poor, and that more than half the country is as well off, or better off, than before the reform.

A journalist can check the numbers of poor people at different poverty lines (maybe even using different measurements of income), investigate how the poverty line is calculated, or examine the effect of different redistribution policies. The figures exist, though working out how they were calculated can be a headache. All this takes time and some expertise, which is a problem.

Or the newspaper can just give up, and tell the journalist to repeat the government’s claim that Poverty is History. In which case that journalist is a loyal Cynic.

The article that Matt criticises covers many of the assumptions on poverty lines in some detail, and highlights their shortcomings. He feels the BBC should have done better.

I don’t agree with most of Matt’s complaint, for two editorial reasons. The first is that, where assumptions are made, I think they are clearly and accurately spelt out. The second is that this feature does not attempt to support a conclusion, merely to investigate how we calculate it (I also disagree with his analysis for a couple of economic reasons, but this is not the forum to air that discussion).

Data journalism is becoming trendy. I wish I’d written about Nate Silver in 2008, before I looked like a bandwagon jumper. But here’s the point: statistics do not resolve all arguments. A data journalist needs to understand how the data was collected, how it is presented, and whether the conclusions are justified by the data. The journalist also needs to resist overclaiming, based on a the emotional appeal of what the data seems to say.

I can show you plenty of examples of bad data journalism, where a little understanding can be as bad as none at all: I’ll leave it to you to ask.

Nate Silver’s numbers game

Nate Silver: hard to believe this man is a statistician

For the second US election in a row, the winner is a guy called Nate Silver, who might be the future of intelligent journalism. He rescues us from the tyranny of columnists who simply write about the comments of their own heads.

Nate blogs at, which is, since 2010, a New York Times blog. He analyses opinion polls, but he does it very, very well. He is entertaining and readable, even if you don’t care who just won the election in the US.

I discovered Nate’s analysis by accident in 2008 when I was looking for some statistics to undermine one of the nuttier blog opinions by data-lite controversialist Melanie Phillips (which made it very nutty indeed). Fivethirtyeight has a rigour that journalism seems to have mislaid in the internet era in a search for sensation. He does a seemingly simple thing extremely well: when an opinion poll is released, he adds it to a model which creates an aggregate. If the model is well-constructed, this has smaller margins for error and less chance of systematic bias. It is more likely to reflect the true state of the world.

The clever part is that he doesn’t just produce an average. He weights the polls, depending on their sample size, the way the information was obtained, the historical accuracy of the polling company, when it was conducted, the exact question that was asked, and so on. He looks for statistical bias – a consistent under- or over-reporting of candidate’s popularity. He adjusts his own model if he finds evidence that it is biased. Importantly, he writes nerdy blog posts about what he is doing, explaining his reasoning, and pointing out possible flaws in his work.

The result is that “outliers” – polls that, through random sampling, produce a freak result – have little importance on Fivethirtyeight – while on the internet and the news channels they tend to dominate the agenda, albeit fleetingly. This means his reporting is less shouty, but it has proved to be stunningly accurate for two elections in a row: at the time of writing, his analysis has correctly predicted the result in every state for the 2012 US presidential election, and the electoral college vote too.

Having a model doesn’t necessarily mean you will be correct – there are plenty of other statistical models which predicted the election less accurately. Fivethirtyeight carefully spells out the steps in its analytical process (though not the precise parameters of the model), so we can make an informed judgement on the quality of the findings. Any model is open to criticism from other statisticians – but this means they can have an adult, public conversation about what might be improved, or what the impact of a flaw in the analysis might be. We can learn from this, too.

This wouldn’t be important if it was just a different way to present the same news; but this type of analysis creates fresh insight. By polling day 2012, the model predicted a greater than 90 per cent chance of an Obama victory; and yet organisations like the BBC and the FT were using lazy phrases like “too close to call” and “on a knife edge”. If newspapers are prepared to do this type of analysis routinely, I suggest, it offers huge potential for creating an open, analytical type of serious journalism led by numbers and observed reality, not opinions.

Old jokes department: “And what do you do?”

Not every journalist can be a stats geek, though I think they should have more compulsory education in how to interpret data, and would prefer that newspapers enforced an in-house ban on reporting surveys that are statistical nonsense – which, in my experience, is most of them (I’ve written those survey-based articles in the past, and reported lots of rubbish data as if it were spotless, which I regret).

Newspapers and magazines are cutting back on conventional journalism. Budgets are tight. It’s probably too much to hope that we can create a new type of data-journalist, or that newspapers will suddenly grow a statistical conscience. It needn’t be expensive: a laptop and some specialist software is perfectly adequate to do the statistical research that can validate the claims that powerful people make. It’s the job of the media to investigate these claims – not just talk to one person who agrees, and another who disagrees. On Radio 4, More or Less does an entertaining job of validating reported statistics (download the podcasts, they are excellent). Ben Goldacre’s Bad Science posts are also a model of this approach.

It’s patronising to assume that readers can’t cope with statistical analysis. Clearly, many don’t like it, and some misunderstand it; but that’s true of any type of journalism that goes beyond the obvious. The conclusions (especially those that go against gut feel or conventional wisdom) may be unpopular: just read the critical comments on Nate Silver’s blog. It’s also true that science isn’t the last word on a subject, just a powerful way of testing an assumption. Statistics involves making value judgements in how you treat the numbers, in the same way as a journalist makes a judgement about how much credibility to give any source. But in statistics there is the opportunity to be explicit about those judgements, and then go where the numbers take us.

This type of insight is a fundamental tool, in an increasingly complex world, if we want to make informed decisions. The alternative is to just place trust in the conclusions of “experts”, of which there seem to be an ever-increasing number quoted on TV or in newspapers.

I’ll leave the conclusion to one of Nate’s commenters, who explains it better than I do:

Rather than cheer for Nate because we all like his Obama forecasts, how about cheering for him because he might believe in a world where numbers and rational analysis are vital to how we make decisions, even in those cases where we don’t like what the numbers imply?… It’s not about hoping you will win at Vegas. It’s about understanding how the Vegas game works.

The rich: better than you, but in a nice way

Too many low-value people

I dislike the idea that some of us are “high value” people if that value is based on wealth alone. Yesterday I read that “high value” people may be allowed to pass through UK airports more quickly, because it is somehow wrong that they should stand in a big queue with the rest of us.

It’s a fundamental assumption (though clearly an optimistic one) that society gives all of us the same value, except in specific situations, which means there are expectations which we all share. The social concept of “value” is based in expertise and helpfulness. Doctors and nurses can point to a qualification, and they can show a consistent record of successful intervention when they are needed. Similarly, entrepreneurs may help us by investing in the economy, which would be handy right now. But we share a common set of values. Doctors can’t be racists. An entrepreneur can’t prise the last pint of milk out of my fingers in the queue at the supermarket, or take the last seat on the bus, not even Sir James Dyson. Maybe him, on reflection.

Back to the airport: I’d prefer a country where passports get checked in the order we arrive at the desk.

The offensive idea to fast-track those of us with high value isn’t designed to get firemen and nurses through passport control more quickly. It is clearly a case where “high value” is a feelgood alternative for “rich”. In practice, the “value” which the Borders Agency wants to give us will not be social value. Here’s the Guardian reporting Brian Moore, the departing head of the UK Border Force, describing the plans to define a super-race of people who might get their passports checked before the rest of us:

Moore said it would cover people who were “valuable to the economy and were valued by the airlines”. He said the move was intended to demonstrate that Britain was “open for business”.

Note the sneaky little transition: for the “valuable to the economy” bit, the government would have to tell us all whether we are useful to it or not, which isn’t going to happen for electoral reasons I don’t need to explain. In which case only the second description, “valued by the airlines”, matters. It becomes a frequent-flier perk for business class. The Borders Agency would be moonlighting for the British Airways Executive Club.

So the class system is being disguised as social opportunity. In reality, the government would not know if the members of this commercially-designated super race are of any value at all to the UK economy. But they would get preferential treatment because they’re defined as “high value” by a commercial entity, and the whole thing is given the “open for business” label so we don’t realise that it is basically a regressive perk for the wealthy.

Similar logic applies to the fashionable generic description of rich people as “wealth creators”. I thought that the people who created wealth were the workers, who are paid less than the value of their labour. That profit may improve their lives through more jobs and higher wages, or might be hidden in the Cayman Islands. All we can say with certainty is that the rich are “wealth possessors”. The economic mumbo-jumbo that describes them as “wealth creators” is there to distract us.

Calling someone a wealth possessor doesn’t make us happy though, which is why the phrase wealth creator is becoming more common now that inequality is at its worst since 1940. It’s the sound of the privileged speaking well of themselves, in case the rest of us get all upset and start asking questions about offshore tax havens and equality of opportunity:

Note also that the UK leads the world in using this term. More than half of the English-language articles describing people as “wealth creators” are published in the UK. In the US, the slightly more defendable (though no more economically justifiable) “job creators” is preferred for this elite social class.

We can’t seem to shake off the idea that wealthy people deserve respect for what they are, not what they do. If these mysterious “high value” people can demonstrate that they have been selected because their wealth works for our benefit, not just theirs, maybe they can push in front of me at the supermarket and take my milk. That is, assuming the government doesn’t give them their own line at the till first.

Exclusive: Obama campaign links to South Ribble’s secret Marxists

Some Marxists eat food like this

I used to moan that there was too little debate about politics in the UK. Policy discussion prominently involved making up slogans and white male politicians boasted about the black people they met. I wanted more robust debate.

Be careful what you wish for. In the US, a country that I admire for its logical approach to spelling, bizarre yet entertaining sports and excellent comedy and drama that often make British equivalents seem like a school play, political hell now regularly breaks loose, and often it’s a bit barmy. Lately the press has decided to debate the meaning of the word Forward, because that’s the Obama campaign slogan.

It’s definitely a more useful arrangement of seven letters than the unspoofable Australian political slogan We are Us, which just makes no sense at all. The question that the hard-of-thinking political class has been asking: does using the word prove that he’s secretly a communist?

I don’t want to prejudge the issue, other than saying that the Marxism claim is the sort of thing that a smelly drunk guy at a bus stop starts telling you about while people give you furtive sympathetic looks. But read the papers, and they’re sounding more like the smelly guy. The Washington Times is just one of the newspapers which pointed out that the radical left often calls its publications “Forward” too. The journalists who wrote the story even went as far as looking these newspapers up on Wikipedia.

(Note to my American journalist peers: we all occasionally fill up 300 words by cutting and pasting from Wikipedia – but if you admit that you’re doing it, you ruin things for the rest of us. Still, it saved me a job finding the links for you.)

Even a stopped clock is correct twice a day, and so the lazy political hacks of the Washington Times have a small point. Historically, a lot of socialist papers have been called Forward. As a name it certainly has the edge over Sideways, Backwards and The Kingston Whig-Standard.

To help my North American readers decide on Forwardgate, I checked out some of the newspapers called Forward that attempt to brainwash Brits.

In Gateshead, Moving Forward newspaper suspiciously offers “free courses” organised by the Gateshead Housing Company.  It promises you will learn “new” skills and meet “new” people.

Communistic American attendees will be pleased to know that there are interpreters available on these courses, as the Geordie accent can be challenging:

If anyone is innocently thinking of sending their children from the US to Gateshead to take one of these courses, I need only remind you of Obama’s compulsory re-education camps that you were warned about in 2009. Could it be that these imaginary camps have simply relocated to the North-East of England? Well, no, but I’ve never started a conspiracy theory before, so you might want to run with this one for me.

The US has a long tradition of political radicals who prefer to live outside the narrow confines of civilisation in places where the norms of polite society and rule of law don’t apply. The UK equivalent of these places is Preston. It is no surprise to find that local South Ribble Borough Council calls its newspaper Forward as well.

You won’t be surprised to hear that the commies have made this publication carbon neutral, when they could just as easily have published one that used non-socialist carbon stuff instead. Provocative.

“Who will win South Ribble’s Search for a Star Contest?” it asks, innocently. I suggest it wants one of its fellow travellers to inform on that person so that the South Ribble Politburo can authorise its secret police to intern him or her without charge as a warning to those who seek to exercise the cherished capitalist freedom to win talent competitions. Is it a coincidence that previous South Ribble Search for a Star Winners are almost always never heard of again? I think not.

Finally, the latest edition of Forward from Birmingham City Council hides its crypto-communist credentials inside articles titled: State-of-the-art new public pool makes a splash and Fun for all at Big Jubilee Weekend, but it doesn’t fool me.

My warning is especially relevant for America’s easily-fooled liberal East coast metropolitans: this disgraceful radical propaganda sheet boasts that:

Influential critics at the New York Times newspaper have placed Birmingham at number 19 in its ‘Places To Go In 2012’ shortlist thanks to the city’s growing reputation for world-class cuisine.

Don’t fall for it, New Yorkers! If you visit one of the area’s interesting, inexpensive and welcoming Indian restaurants there will probably some mind altering Marxist drug in your chicken Balti. How do I know? Well, if the critics from the NYT think there are only 18 better places to visit than Birmingham, someone’s definitely been taking something.

Worst practice

Package of measures

At the weekend I enjoyed reading a review of the latest set of political diaries published by Chris Mullin, former member of parliament and lifelong plain speaker. In the latest volume, which covers the birth of New Labour and the 1997 election, he criticises the Gordon Brown – at that point a pushy shadow Chancellor of the Exchequer on the way up the political ladder. In the diary Mullin complains that Brown is spending every weekend trying to get on the TV news, “but having got there he has nothing to say beyond calling for a package of measures.”

The package of measures (PoM) promises so much – until you ask yourself what the person calling for it actually wants, and you realise you’re not sure.

(In one way, perhaps, Brown’s desire for packages of measures was satisfied in the ten years after 1997. An average of 2,685 laws was passed each year, more than in any other period. While Brown was prime minister, 33 criminal offences were created a month, including “Carrying grain on a ship without a copy of the International Grain Code on board “, and not nominating a keyholder for your burglar alarm.)

I checked to see whether Brown continued to be a prolific package-caller in government. Yes:

In the years 1994-1997, Mullin is spot-on. Brown called for (or announced) many more packages than Tony Blair while they were in opposition. After 1997, while Blair was prime minister, Brown showed PoM leadership in most years. Succeeding Blair in the top job, plus a financial meltdown, seems to have inspired a frenzy of late career measure-package-announcing in Brown, if PoMs can come in frenzies.

PoMs are hard to argue against unless you’re a complete contrarian, because they are sold as an outcome, not component by component – a “package of measures to…”, followed by a generally admirable suggestion. They’re the political equivalent of a Talk Normal business jargon favourite, Best Practice (BP). Calling for companies to adopt BP is a no-brainer, in that you don’t need a brain to do it. Claiming you follow BP is an impressive-sounding, though often empty, way to speak well of yourself.

BP-recommending has been on the rise since 1994, at least in the UK (it’s not nearly so popular in the US; I don’t know why). The red best-fit line shows that, since 1994, the rise in claims to use/provide/know/sell BP averages 34 per cent per year:

If you’ve been responsible for this BP inflation, I bring bad news. McKinsey has discovered that companies that adopt it often do worse than those who think for themselves. The optimal response to companies who chunter vacantly about BP might be the same as for a politician who calls for a meaningless package of measures on the weekend news. Switch off.

Softening the impact

HS2: I'm just saying, it could happen

Reading my copy of Private Eye this week, I was interested in a letter (page 13) from Robin Stummer, who was complaining about the government’s feasibility study into HS2, the new high-speed rail line between London and Birmingham – and especially the use of the weasel phrase “physical impact” to describe what will happen to 300 or so listed buildings, conservation areas and woodlands along the route. Here’s an example from the report, which warns us that building HS2 will include:

Adverse physical impacts on two Scheduled Monuments, 14 Grade II listed buildings and 3 Grade II* Registered parks and gardens within the physical impact corridor.

Imagine a man with a clipboard and a peaked hat saying it. I like trains, but I like them less when I read documents like this.

I quote newfound talknormalist Optymystic, commenting to an article about Talk Normal:

Impacts is used as a substitute for causes, influences, bears upon, determines, affects, all of which provide precise ways of expressing the sense clearly, by contrast with which “impacts” is vague.

He could have included stronger words such as decreases or destroys, but he makes a good point: it’s part of a flattening of the language that seems to be assisting in the flattening of listed buildings. You can’t tell what an adverse physical impact is, because it could be anything from having a train tootling by just outside your moat to having one whizzing up your Grade II listed hallway – maybe that’s what they mean by an impact corridor.

“Impact” is a technocratic weasel word that avoids having to explain what the result of the impact is, which is precisely what we need to know. It’s also a successful weasel word, twice as popular as it was 10 years ago:

“Impacts” usually means something bad: rule of thumb from Factiva is that there are two admitted negative impacts in the press for every one described as positive; with the majority left unqualified so that we have to work out for ourselves what people are carefully trying not to tell us.

I’m guessing that the unqualified uses of the word are, in the main, bad news avoided to make sure we don’t get too upset. After all, impact is not an efficient word when used to deliver happiness: no one tells you that you’ve won the lottery by announcing that it will “impact your ability to pay the rent”. But, if I’m working for you and I tell you that creating silly pictures of trains for Talk Normal will “impact my ability to meet your deadline”, then take it from me: I’m going to be late.

Nagging: someone must do something


If you haven't watched all 1001 of them, you clearly deserve to die anyway

During the UK general election, and afterwards, I thought I was reading an unusual number of comment articles telling David Cameron, Nick Clegg and Gordon Brown that they “must” do something. Once I’d spotted it, I couldn’t stop noticing that all of us are constantly being told what we must learn, deliver or promise. Governments were most often the recipients of this nagging, as were religions, and for less specific nags, “we” are constantly being told by columnists what we must do. And I haven’t even got to the things we must not do yet.

I checked to see if there was an increase in newspaper-based nagging. In British newspapers between 1990 and 1998, the frequency of headlines telling us we “must” do something declined gradually:

Then it started a long, steep climb. Now nags are twice as frequent as they were in 1998:

We must find out why. Someone must take the blame for this. Something must be done. Not that it will be: newspapers run many more opinion pieces than they did in 1998. They use them to attract commenters, which creates advertising revenue. Telling a person or group what to do is a quick way to start an argument and, in this context, all arguments are good.

Alternatively, as we become less patient and increasingly self-obsessed, we can just forget the column underneath the headline (most of us do that already) and personalise the experience. You could sign up to a genuine Daily Me, written by robot columnists, which is just a series of nagging headlines inspired by the newspaper we really care about: our Facebook wall posts.




That’s much more useful than telling me that I must not let slip the opportunity to provide a legacy from the 2012 Olympics. I live next door to the stadium, but I’m pretty sure it’s not me they should be nagging.

Meanwhile columnists are free to tell all sorts of groups what they must do in the certain knowledge that their instructions will be ignored. They are lucky that no one has decided yet that bossy opinion columnists must be paid by results, because they might as well write an article telling ice cream it must not melt.

