When I was a teenager, my friends and I used to waste time at school by talking about the basketball box scores from the night before. (A box score is rows and columns of statistical information: minutes played, rebounds, assists, points scored etc. I think it started as a baseball term. Scores in a box.) We wanted to come up with a formula that measured how good a player was: the Dominance Quotient, we called it, only slightly self-mockingly.
Jeremy Hunt announced last Wednesday that as many as 270 women may have died because an error in a computer algorithm prevented 450,000 women being invited for routine breast cancer screening appointments. Stories about IT glitches will be increasingly common as artificial intelligence enables more and more healthcare to be automated. As things stand, people are still better than computers at detecting early signs of cancer on mammograms, and the neural networks currently being designed to analyse the images are intended for use as an aid rather than a replacement for human decision making. The hope is to engineer systems that combine the different strengths of humans and computers, with outcomes that neither is capable of independently. The sad reality is that we seem to end up with systems that combine an all-too-human capacity for error with a computer’s blunt force, and so wreak havoc at an inhuman scale.
The statistics make grim reading. In a 2013 report, Overview of Fatal Incidents Involving Cattle, the Health and Safety Executive notes in its usual lapidary prose that ‘this paper gives an overview of fatal incidents involving cattle to (a) Enable Agriculture Industry Advisory Committee members to consider the current trends in agriculture accidents involving cattle.’ There is no room for complacency. The HSE logs 74 'fatalities involving cattle' in the UK in 2000-15, compared to 53 deaths caused by Islamist terrorism in the same period.
‘Around 6000 people lose their lives every year because we do not have a proper seven-day service in hospitals,’ Jeremy Hunt said on 16 July 2015. ‘You are 15 per cent more likely to die if you are admitted on a Sunday compared to being admitted on a Wednesday.’ A Department of Health statement later clarified that the figures came from an analysis ‘soon to be published in the BMJ’. Nick Freemantle, a professor of epidemiology at UCL, had been invited by Bruce Keogh, the chief medical officer, to update a 2012 analysis of hospital data, apparently on the suggestion of Simon Stevens, the new chief executive of NHS England. The resulting paper wasn’t accepted by the BMJ until 29 July, after Hunt’s speech. When it appeared in September, it contained no reference to the 6000 figure.
According to the front page of yesterday’s Guardian, the NHS is to start selling our confidential medical records. Every doctor has a duty to keep patient-identifiable data secure, and only share it as far as is in the patient’s immediate best interests. At the same time, in order to run healthcare organisations or to carry out medical research, it is necessary to compile statistics about diseases and treatments. It therefore makes sense for some information collected in the course of caring for patients to be made more widely available – shared with managers, bureaucrats and researchers – but only if it is anonymised.
One of the many pieces of bin Laden-related trivia in the news today is the resuscitation of a study by a group of geographers at UCLA, published in 2009, which according to the BBC ‘said there was a high probability Osama Bin Laden was located in the town where he was ultimately killed by US operatives on Sunday'. The BBC report goes on: The model employed in the study, which is typically used to track endangered species, said there was a 88.9 per cent chance he was in Abbottabad in Pakistan. But geographer Thomas Gillespie at UCLA said the same study gave a 95 per cent chance he was in another town, Parachinar. There's clearly something amiss here: if there was an 88.9 per cent chance he was in Abbottabad, there could only have been an 11.1 per cent chance he was anywhere else. Puzzled, I asked a statistician how the numbers could add up, and he said:
The four most ‘informative’ words in Moby-Dick, statistically speaking, are ‘I’, ‘whale’, ‘you’ and ‘Ahab’. Marcello Montemurro and Damian Zanette worked this out by comparing the text of Moby-Dick to all the possible alternatives obtainable by shuffling Melville’s words into random sequences. These are not the four words that are used most often, or that carry the most ‘information’ in the everyday sense of the term, but the words whose positioning in the original, meaningful text differs most from the way they would be scattered in all other permutations. The ‘information’ here is of the mathematical, measurable kind: ‘most informative’ means ‘least randomly distributed’. It may seem a slightly odd way to try to quantify semantic content, as though when Melville wrote Moby-Dick, it wasn’t so much a matter of finding the right words, as of putting them down in the right order.