In November 2017 the China Daily ran a story about a Beijing barbecue restaurant called Liuyedao (‘The Lancet’), which was offering a discount to any customer who could show they had recently published a paper in a scientific journal. The restaurant took the journal’s ‘impact factor’ – a statistic based on the average number of citations received by papers carried in the journal in the two years after publication – and converted it into a cash equivalent, to be deducted from the bill.
The impact factor was invented in the 1960s as a way of helping academic librarians decide which journals to hold in their collections. It is a reasonable measure of the strength of a journal. But the distribution of citations is skewed: a small fraction of papers account for most of them, while most papers get few if any. So the impact factor isn’t much use as a guide to the quality of an individual paper: no one should judge a scientific article by the journal in which it appears. Most people working in research know this. Yet because scientific papers are difficult for non-specialists to understand, and because it is so hard to keep on top of the literature, the temptation to assess scientific work by means of a single, simple measurement, even a bogus one, is hard to resist. As a result, the impact factor has become a marker of prestige. Even before anyone has read it, a paper published in the Lancet, which currently has an impact factor of 60, is worth far more than a paper in a specialist outlet such as Clinical Genetics, with its impact factor of 4.4. Unsurprisingly, this has an influence on where researchers submit their work.
Whether and where a researcher gets hired depends to a large extent on their publication record: how many papers they put their name to, which journals they are published in, how many times their papers are cited in other papers. This has created a system that favours speed of publication, volume of output and – because journals prefer new, eye-catching findings over negative results or replications of previous work – sensationalism. There can be financial incentives too. At the time the barbecue-discount story appeared, many Chinese universities were giving cash bonuses for publications, with higher-impact journals securing bigger rewards for researchers. In a survey of Chinese university policy in 2016, the average bonus for the lead author of a paper in Nature or Science was calculated at $44,000, five times the average professorial salary.
The chief form of pre-publication quality control in science is peer review. Journal editors send submissions to experts, usually two of them. Their job is to judge whether a study’s methods, data and analyses are sound, and whether the evidence backs up the authors’ claims. Their (most often anonymous) reports assess the work’s validity and importance, suggest how it might be improved, and recommend rejection or acceptance, usually with required revisions. Reviewers are increasingly difficult to find, since they are generally not paid or otherwise credited, and must fit the work around their own teaching, research and administrative responsibilities. It is next to impossible, even in the several hours it takes to put together a typical review, to check all of a paper’s methods and analyses. The same goes for detecting simple errors, such as a wrong number typed into a spreadsheet or a mix-up in cell cultures, let alone fabricated, fraudulent data. As a result, reviewers and journals end up taking a lot on trust. Even diligent reviewing is inconsistent, since reviewers may disagree about a paper’s merits, and will have their own intellectual and social biases.
In Science Fictions, Stuart Ritchie explores the problems with this system. The book is a useful account of ten years or more of debate, mostly in specialist circles, about reproducibility: the principle that one purpose of a scientific paper is to make it possible for others to carry out the same work, and that one test of its reliability is whether they get the same result. In recent decades there have been large-scale efforts at replication in several fields, but if an experiment can’t be repeated, it doesn’t necessarily mean the original work was incompetent. Work at the frontier of a discipline is difficult, and skilled hands are an underacknowledged factor in scientific success. Some observations are noteworthy precisely because they are unusual, or depend on their context. Sometimes doing the same experiment and getting a different result reveals something useful. Even so, the findings of these large-scale replication studies have helped to fuel a widespread sense that science is failing on its own terms: in cancer biology, one effort managed to replicate just six out of 53 studies; in psychology about 50 per cent of studies cannot be replicated; in economics, about 40 per cent. In 2016 Nature surveyed researchers across the natural sciences and found that more than half the respondents had been unable to repeat their own work, though less than a third regarded the failure as a sure sign that a study was wrong.
At one end of the replication crisis, as it has become known, there are spectacular frauds. In the early 2000s the South Korean biologist Hwang Woo-suk became a national hero for cloning human stem cells; just a few years earlier, the materials scientist Jan Hendrik Schön was being tipped for a Nobel Prize for papers describing molecular-scale electronic components. Both had made up their results. In surveys, about 2 per cent of researchers admit to fabricating data, though many more suspect their colleagues of doing so. But deliberate malpractice probably accounts for only a small portion of unreliable science. The greater concern is that the rush to publish and the pressure to make a splash pushes researchers to take short cuts and dodges: low-level fiddles that stop short of fraud but undermine reliability.
As Ritchie shows, every section of a standard scientific paper is a potential source of problems. Many researchers describe their methods so sketchily that it’s impossible for others to repeat them. The language in discussions and abstracts has become increasingly hyperbolic: between 1974 and 2014, the proportion of papers describing their findings as innovative, robust, unprecedented, groundbreaking and so on rose more than eightfold. But the crux of the replication crisis is in the way results are presented and analysed. There may be legitimate reasons for excluding a particular measurement or tidying up an image, but it’s also true that you’re unlikely to be caught if, consciously or otherwise, you do something like this in order to steer your results towards the conclusion you’d like.
Some problems are specific to particular disciplines. In cancer biology, countless papers are based on studies using contaminated or misidentified cell cultures. The advent of Photoshop has made it difficult to tell when images of cell processes and genetic molecules have been enhanced, duplicated, spliced or otherwise manipulated. Other problems are shared by many fields. The use of statistics is a general concern. Statistical methods are used to analyse the patterns in data, with a view to distinguishing random variation from underlying causes. When an experimental result is described as statistically significant, that usually means a statistical test has shown there is a less than 5 per cent chance that the difference between that result and, for example, the corresponding result in a control experiment is attributable to random variation.
There are lots of different statistical tests, each suited to asking a particular question about a particular type of data. Computer software makes it possible to perform such tests without difficulty and without any knowledge of their mathematical foundation. That makes it easy to drift away from testing a hypothesis towards searching for unlikely patterns in data. The more tests you do, the more likely you are to find a result that makes it under the 5 per cent threshold. You can then write up the study as if that was what you were looking for all along. This is known as P-hacking or HARKing: hypothesising after the results are known.
Other varieties of P-hacking include continuing to collect data until you find something statistically significant, and excluding outliers or other measurements because they mess up your stats. Between a third and a half of scientists own up to having done something along these lines. Other widespread practices include splitting studies to yield as many publications as possible, and recycling chunks of old work in new papers. Funding also creates a bias towards positive results: experiments and trials paid for by the pharmaceutical industry, for example, tend to show more positive results than equivalent studies funded by governments and charities. The bias towards positive results has become self-perpetuating: when three-quarters or more of the papers published report positive findings, there is a strong incentive to avoid negative results, or to avoid writing them up.
The worry is that scientific processes have been undermined by perverse incentives to the point that it’s difficult to know what to believe. The crisis has hit psychology, Ritchie’s own discipline, and biomedicine especially hard. These are crowded, competitive fields, in which research groups around the world are racing one another to publish on the hottest topics. In these circumstances, haste can win out over care. The data in these fields tends to be noisy, leaving room for interpretation and manipulation in presentation and analysis, and psychologists and biologists tend to be less mathematically expert than their colleagues in the physical sciences. It’s possible, however, that these fields have come in for more than their fair share of investigation: it’s more straightforward to redo a lab experiment than, say, a field study of animal behaviour. At the same time, research in psychology, health and medicine also attracts an unusual degree of scrutiny because its results can have direct effects on our everyday lives. When schools base their teaching practices on experiments in child psychology or tabloids run scare stories about everyday foodstuffs based on a single study, it matters whether or not the original research is repeatable.
In biomedicine, a reproducibility rate of around 50 per cent equates to a lot of money spent on unreliable work – $28 billion a year in the US alone, according to a study from 2015. Pharmaceutical companies put some of the blame for the slow progress and high costs of drug development on the unreliability of basic research. More than 90 per cent of the chemical compounds identified as potential drugs fail to make it to trials; in 2011, Bayer halted two-thirds of its projects on target molecules because its in-house scientists could not reproduce results reported in the literature.
All this is bad enough, yet reproducibility is just one of several intersecting problems resulting from the ever more fierce competition for resources and prestige in science. Both have become harder to secure as the number of people working in research has grown faster than the supply of money or permanent jobs. Science publishes fewer than 7 per cent of the submissions it receives – that’s typical for prestigious journals – while roughly three-quarters of research grant applications fail. Because publication in high-impact international journals is the decisive measure of achievement, it shapes what research is done, and the way it is done. Work that focuses on the local is devalued, especially if it is published in languages other than English, and interdisciplinary and unorthodox approaches are relegated to less visible, lower-status outlets. The most prestigious journals, meanwhile, occupy – indeed, constitute – the mainstream in their fields.
The pressure to churn out papers also drives a culture of overwork – and in some cases bullying – which bears down most heavily on postgraduate and postdoctoral researchers. These are the people who actually do most of the laboratory and fieldwork; they are usually on studentships or contracts lasting between three and five years, and their ability to build a publication record depends heavily on the patronage of the senior researchers in whose labs they work. None of this does anything to encourage a diversity of viewpoints in the scientific workforce, or to challenge biases. If a brutally competitive environment helped the best work rise to the top, there might be an argument that the misery was justified. You might, for example, think that a system which can deliver several highly effective vaccines for a new disease in less than a year must be doing something right. Maybe so, but most research has to fight for funding and attention in a way that work on Covid-19 does not. A junior researcher, browbeaten by her boss and on a contract that’s about to end, may be tempted to cut corners. Funding bodies with low success rates lean towards low-risk, short-term projects. Even when the science is reliable, the knowledge produced may be trivial.
There is widespread agreement that something has to be done about these problems. Funding bodies and governments are showing an increasing willingness to act as regulators. In February 2020, the Chinese government introduced reforms including a ban on cash incentives for publications, a move away from using impact factors in recruitment and promotion, and a requirement that researchers publish at least a third of their work in domestic journals. In the UK, the government is working with the country’s main public research funder, UK Research and Innovation, along with such influential bodies as the Royal Society and the Wellcome Trust, in an effort to improve the culture of research. One proposal is that scientists be given more recognition for all the things they do besides producing papers, such as writing computer code, giving policy advice, communicating with the public, and working with companies and civil society groups. And, given that nearly all research is now collaborative, the focus of evaluation is shifting away from individual achievement towards teams and institutions.
Meanwhile, more than fifty universities have signed up to the UK Reproducibility Network, a bottom-up initiative to improve British researchers’ training and methods. Journals are making it easier for others to check papers by requiring authors to make their raw data publicly accessible, rather than merely reporting their analyses. In an effort to reduce the temptation of P-hacking, many journals have begun to allow researchers to submit their hypotheses, along with plans for the experiments and analyses they intend to carry out, before they start the work. There are efforts, national and international, to make sure that the results of all clinical trials are made public, to prevent the concealment of negative results. And automated tools are now available that can help identify errors in data and images.
All this is welcome, but one issue that those in authority have shown little interest in tackling is how to reduce the reliance of the system on short-term contracts, which far outnumber the permanent jobs available in publicly funded research. Is it even possible to increase job security for researchers at a time when teaching in universities is becoming ever more precarious? A similar question might be asked about universities’ ability to resist the forces that drive competition in research. Can any particular institution or nation afford to opt out unilaterally? What happens if the kinder, gentler universities start to slide in the international rankings, which are based partly on measures of publications and citations?
Such reform as there has been is most evident in academic publishing. Historically, as scientists see it, the largest for-profit publishing companies have taken their free labour as authors and reviewers and made them pay through the nose to read the results. In 2019 Elsevier, the largest of these companies, had a profit margin of 37 per cent. The open access movement, begun by activist scientists around the turn of the century, makes the moral case that readers shouldn’t have to pay to see the results of science that has already been paid for with public money, and the practical case that freely accessible papers would be easier to validate and build on. The movement has made a lot of progress, at least in Europe, where the EU and many other funders, including UKRI and the Wellcome, have now stopped the paywalling of publications resulting from their grants.
Print journals have always been selective because they have a limited number of pages. Traditionally, they have sought to judge not just whether a given paper is sound, but whether its findings are important. Many still do; it’s part of their cachet. But as most journals move entirely online, space is no longer an issue. In the twenty years since the open access movement began, a new business model has emerged in which subscriptions are replaced by publication fees charged to authors. We have seen the rise of open access megajournals, which ask reviewers to judge the validity of results but not their significance, and accept all submissions, including replications and negative findings. This approach is not watertight; in 2016, the Public Library of Science’s megajournal PLOS One, which currently charges a publication fee of $1695 per paper, carried a study reporting that the human hand showed ‘the proper design by the Creator’. But the model has proved popular both with researchers, who get a relatively quick, painless and cheap route to publication, and with publishers, who get a cash cow.
PLOS One has an impact factor of just 2.7. ‘Ideally,’ Ritchie writes, ‘what we want to see is an accurate proportion of null results, and more attempted replications, in the glamorous, high-impact journals.’ But that’s to take for granted that the ‘glamorous journals’ should retain their status. The argument can be made that the scholarly publishing industry is beyond fixing, and we would be better off without journals – perhaps even without pre-publication peer review. Publishers are no longer needed for typesetting or distribution, and social media – including specialist sites such as ResearchGate, which has more than 17 million users – can perform some of their curatorial and marketing functions. What value the journals retain lies in their brands, and in their capacity to organise peer review. But if peer review isn’t working, what is the brand worth? Why not just let researchers make their work public, and let it thrive, or wither?
This approach is feasible thanks to preprint servers. These are online repositories on which papers can be posted without peer review, though most of them do carry out vetting for plagiarism, health and security risks, and general appropriateness. Publishing studies as preprints has long been the norm in many areas of physics and maths, where pretty much everything in journals will already have appeared – for free, in an unreviewed but often very similar form – on a site called arXiv (‘archive’). Biologists were initially slower to publish preprints, partly as a result of the worry that showing their hand would make it easier for others to beat them to a reviewed journal paper. This reluctance had already begun to fade before 2020, but the Covid-19 pandemic has transformed attitudes. The imperative to share findings as quickly and widely as possible, so that others can test and make use of them, is inarguable.
This is science working in the way many would want it to: rapid, open, collaborative, and focused on the benefit to the public. There is a downside: preprint servers and journals have been swamped by shoddy papers, sent out in a hurry to catch the Covid-19 bandwagon. If scientists are the only ones reading the work, this isn’t much of a problem, but over the course of the last year, journalists, patients, cranks and anyone else with an interest have also been monitoring the servers, and they don’t always distinguish between a preprint and a reviewed paper. As a result, some papers that would have been unlikely to make it into a journal have received bursts of publicity. In January last year, for example, a paper appeared on the bioRxiv server, run out of the august Cold Spring Harbor Laboratory on Long Island, claiming that there was an ‘uncanny similarity’ between Sars-Cov-2 and HIV, which was ‘unlikely to be fortuitous’. This fuelled wild talk that the coronavirus was an engineered bioweapon. The article now bears a red ‘withdrawn’ label, but it is still easy to find and free to read. (To preserve the integrity of the literature, retracted papers usually aren’t deleted.) The following month, the Fox News presenter Tucker Carlson cited a preprint posted on ResearchGate claiming that the Wuhan wet market suspected of being the origin of the outbreak was close to a coronavirus research lab. The author later took it down, telling the Wall Street Journal that it ‘was not supported by direct proofs’; Carlson said the paper had been ‘covered up’.
This problem isn’t confined to preprints: plenty of journals have retracted Covid-19 studies that had passed peer review and been published. And science has always been open to misappropriation; making it less accessible, if that were possible, wouldn’t end that. But papers, whether journal articles or preprints, are still the most important interface between scientific research and wider society. The increased potential for the rapid dissemination of bad-faith interpretations of scientific publications gives researchers one more good reason to take pains over their work.