As chest X-rays of Covid-19 patients began to be published in radiology journals, AI researchers put together an online database of the images and started experimenting with algorithms that could distinguish between them and other X-rays. Early results were astonishingly successful, but disappointment soon followed. The algorithms were responding not to signs of the disease, but to minor technical differences between the two sets of images, which were sourced from different hospitals: such things as the way the images were labelled, or how the patient was positioned in the scanner. It’s a common problem in AI. We often refer to ‘deep’ machine learning because we think of the calculations as being organised in layers and we now use many more layers than we used to, but what is learned is nevertheless superficial.