Patients often complain that their GP spends more time typing and looking at a computer screen than listening to them. This isn’t really new: doctors have kept records of their encounters with patients since the time of Hippocrates. But changes in record-keeping practices have both reflected and enabled the development of modern scientific medicine, which is less concerned with patients and more with their diseases. Today’s doctors are taught to get a comprehensive history from each patient and to document it in a highly structured way, beginning with the complaint and the patient’s account of it, followed by the doctor’s observations and assessment of the case and, finally, the management plan, whether that is the initiation of treatment, the ordering of tests or simply a note that the patient was reassured. Recording the history isn’t just a matter of documenting a few facts: it’s part of the process of working out the case. It is also, in a way that might not be immediately obvious, creative work. The doctor’s diagnosis and the information he or she records is new intellectual property – property that can be, and is, bought and sold.
Large collections of medical records are enormously valuable. How useful they can be was made clear in 2004, when researchers from the London School of Hygiene and Tropical Medicine were looking for evidence that might help restore confidence in the MMR vaccine in the aftermath of the alarm caused by the gastroenterologist Andrew Wakefield’s suggestion that the vaccine was linked to autism. They searched an earlier version of the database now known as the Clinical Practice Research Datalink, and identified 1294 children diagnosed with autism or pervasive developmental disorder (‘cases’) and 4469 children without such a diagnosis but matched for age, gender and GP practice (‘controls’). The database included the immunisation records of the two groups in the years leading up to the diagnosis, and the team was able to show that the vaccination rates of cases and controls were roughly comparable. This meant it was highly unlikely that the vaccine was causing autism. The paper appeared in the Lancet and its findings were widely reported in the media as demonstrating that MMR was safe. Thousands of studies have used the CPRD or similar databases to investigate the risks and benefits of common drugs (statins, biophosphates, low-dose aspirin), or to examine the relationship between risk factors and outcomes (the way the relationship between blood pressure and cardiovascular disease plays out in different age groups, at different levels of blood pressure or in different forms of cardiac complaint, for example), or to consider whether common diseases affect distinct subgroups of patients who might benefit from different treatments.
The CPRD began as a business idea. In the late 1980s a number of companies began selling IT to GPs. The trade was given an unexpected fillip in the early 1990s when the Thatcher government introduced GP fundholding, making GPs responsible for purchasing elective and non-urgent services for their patients. The new administrative burdens associated with this effectively required GPs to computerise their practice records. One firm, VAMP Health, seemed particularly attractive because it gave away the software and hardware for free. All the GP had to do was to agree that VAMP could collect the practice’s anonymised data about patients, data VAMP could then sell on. In 1993 Reuters Health Information acquired the company, but after discovering that some parts of the business were mired in expensive contracts, it gave the database to the Department of Health. The CPRD is now controlled by the Medicines and Healthcare Products Regulatory Agency. It will supply data, for a substantial fee, to researchers whose proposals must pass a strict test of scientific merit as well as reaching appropriate ethical standards. Commercial organisations can apply but will only be considered if their research is deemed to be in the public interest.
The CPRD holds some or all of the GP records for more than 20 million patients, from more than 800 GP practices in the UK. The data, which is uploaded automatically from the IT systems used by participating GPs, includes everything these GPs record about their patients: diagnoses, hospital referrals, prescriptions, vaccinations, test results, whether they smoke, how much they drink. All this information is added to the database without the patient’s consent. Although that may seem at odds with a patient’s rights, it is crucial to the data’s value: if you ask for consent not everyone will give it, and, worse, the people who do give consent aren’t typical, so the data no longer tells you what you need to know. The legal and ethical justification isn’t, however, based on the value of the resource or the science it enables, but on the idea that since the data is anonymised, the patient no longer has any rights over it.
In 1997 the Department of Health issued guidance to the effect that handing over data to companies, even anonymised data, was a breach of the clinician’s duty of confidentiality. A company called Source Informatics – which had been paying pharmacies for data on GPs’ prescribing habits, and selling it on to pharmaceutical companies – asked the courts to review the guidance, which was first upheld but then overturned by the Court of Appeal. Its 1999 judgment is the clearest statement of patients’ rights and doctors’ responsibilities as they affect the disclosure of data in the UK. Although I’ve been talking about a ‘patient’s data’, the law doesn’t allow that the data is the property of the patient. The data is created by the clinician: it’s their intellectual property. The legal arguments turn on the constraints imposed by the clinician’s duty of confidentiality on his or her right to dispose of that property. The Court of Appeal held that removing all identifying information effectively severed the patient’s connection with the data, so that any obligations the clinician had to the patient need no longer affect the way the data was used.
The number of patients entering the CPRD is falling because the GP software system from which most of the data is now acquired – Vision, provided by In Practice Systems Ltd – has been losing market share. Just over half of the GP market is now controlled by a company called EMIS, which earned nearly £100 million from the business in 2016. EMIS encourages its GP customers to submit their patients’ records to a database called QResearch, which is run by a not-for-profit set up in collaboration with Nottingham University. QResearch now has data on 36 million patients. The people running it are more reluctant than the CPRD to share the data, a position they can easily justify as motivated by concern about patient privacy, but one which also allows them to keep control of the asset the data represents: of the top fifty research papers identified when you search Google Scholar for ‘QResearch’, 42 were written by the QResearch team. In practice, if you’re an academic with a limited budget or a company wanting to do research in primary care, you would probably look elsewhere.
There is a third big database in the UK, the Health Improvement Network, owned by IQVIA, a company worth $17.6 billion, which was created from the merger of Quintiles, a contract research organisation, and IMS Health, a data provider. IMS Health was the brainchild of Ludwig Frohlich, a Madison Avenue advertising executive who worked in the pharmaceutical sector. It started out by buying the purchasing records of US pharmacies and publishing market research reports, then began acquiring information from drug wholesalers. By 1993 it was buying up filed prescriptions from thousands of pharmacies and providing sales reps working for pharmaceutical companies with data on the prescribing habits of individual doctors. Although the company was careful to be discreet about its activities, doctors gradually became aware of what was going on and began to lobby legislators to respond. In June 2006, New Hampshire barred the use of prescription records for ‘advertising, marketing, promotion or any activity that could be used to influence sales or market share of a pharmaceutical product’. Maine and Vermont followed suit in 2007.
Tellingly, the American Medical Association chose not to side with the doctors complaining about the traffic in prescription data, but instead joined in the trading, selling the names, addresses and professional histories of 1.4 million US doctors and medical students to IMS Health and its competitors, at a profit of $44 million in 2005. In April 2011, IMS Health – which now claims to have more than 500 million anonymous patient data records – went to the Supreme Court to challenge the bans, and won. Justice Anthony Kennedy argued that ‘speech in aid of pharmaceutical marketing … is a form of expression protected by the Free Speech Clause of the First Amendment.’ People tend to make a distinction between the use of patient data for medical research and for the marketing of pharmaceuticals: one seems worthy, the other dodgy. But, as the Supreme Court judges noted, either anonymised data is protected by the right to privacy or it isn’t, and if medical researchers are to be allowed to access such data, clearly it isn’t.
In 2012 the UK’s coalition government unexpectedly included in its Health and Social Care Act a requirement that all GPs should return information on their patients to a new central database, known as care.data. This database would be used by the NHS for planning services, but it would also be made available to researchers and, significantly, exploited commercially. The proposal was a dramatic illustration of how government thinking on the need to extract value from the data possessed by the NHS had changed since 1997, when Labour had sought to prevent its exploitation. While GPs had been sharing anonymised patient data since the early 1990s without much media comment or public concern, compelling GPs to share data with the government, which had to comply with the European Convention on Human Rights, was a different story. Jeremy Hunt, seeking to defuse anxiety about the proposals, promised that ‘if someone has an objection to their information being shared beyond their own care, it will be respected. All they have to do in that case is speak to their GP and their information won’t leave the GP surgery.’ This became known as a Type 1 opt out. In September 2013 the Health and Social Care Information Centre – the agency running care.data – went further, saying that a patient could also tell their GP if they objected to any confidential information about them leaving the HSCIC in identifiable form. This became known as a Type 2 objection.
By the time the scheme was definitively abandoned in July 2016 around 1.5 million people had taken the trouble to fill in an opt-out form, even though the information leaflet on the initiative which the government felt obliged to distribute didn’t mention care.data by name or give direct instructions on how to opt out. One and a half million is a lot of people, almost on the scale of a social movement. It could be argued that, since the government was determined to extract revenue from the private sector in return for access, this was a protest against the commercialisation of NHS data. Some on the right no doubt saw care.data as an assault by the state on individuals’ right to privacy. Many GPs actively encouraged patients to opt out, attempting perhaps to protect information they saw as their property.
A comparable coalition of privacy campaigners and GPs enjoyed a similar success in Denmark. In September 2014, a Danish newspaper revealed that the state had been building a comprehensive database of GPs’ activities for seven years, without adequate legal authority. The database was created as part of a limited voluntary programme in which GPs shared data on certain diseases in return for feedback on how their prescribing habits, for example, compared to those of other doctors. In 2013, following the breakdown of negotiations with the GPs’ trade union, the government made participation mandatory and threatened to use the data to monitor the accuracy of GPs’ reimbursement claims. The GPs immediately protested, and the press discovered that the agency administering the scheme had been harvesting all the GPs’ data. No legal basis could be found for retaining the information, until, a day before the data was due to be erased, the Danish National Archive stepped in and insisted that it should be preserved as a unique historical and cultural artefact. The minister of culture agreed, arguing that ‘only totalitarian regimes delete the records of their illegal activities.’ In the end, after a parliamentary vote, the data was deleted.
We seem to expect different things of general practice and acute care: we don’t appear to be concerned about records of hospital activity, including details about individual patients, being collected by a central agency. In the UK, NHS Digital, the successor to the HSCIC, compiles Hospital Episode Statistics. In February 2014, the Daily Telegraph reported that the government had sold HES data covering 47 million patients to insurance providers so that they could use the information to refine premiums. The story said that the Institute and Faculty of Actuaries had used the data to investigate the risk factors for critical illnesses. This was not an instance of data about individuals being leaked to their insurers, but of the financial services sector doing research so that it could serve its customers better – or extract profits from them more efficiently, which may be a less reassuring way of looking at it.
A big difficulty in all this is that, in practice, no large collection of information about individuals can be definitively anonymised. For $50 you can buy a dataset called ‘Comprehensive Hospital Abstract Reporting System: Hospital Inpatient Dataset: Clinical Data’, which provides anonymised data on all hospitalisations in Washington State. In 2011, there were 648,384. Each is described using 88 fields, including zip code, age, ethnicity and gender; discharge status; how the bill was paid; diagnosis and procedure codes. A researcher at Harvard, Latanya Sweeney, matched details from this anonymised data with published news stories. Using an online database she found 81 news stories published in Washington State in 2011 containing the word ‘hospitalisation’. In 35 of them a single individual whose anonymous record featured in the hospital data could be confidently identified as the person named in a news report. The hospitalisations were mostly a consequence of car accidents and in most cases the data contained few specific details beyond those found in the newspaper report. In ten cases, however, there were references in the hospital data, but not in the news story, to potentially sensitive information such as a patient’s venereal disease, drug dependency, alcohol use or payment issues. Ensuring that anonymised data remains anonymised requires only that it is shared carefully – with regulated researchers who can be expected to comply with legislation like the UK’s Data Protection Act – but persuading a sceptical public of this may be hard.
The companies that sold IT systems to hospitals are now exploiting their privileged access to the patient data lodged in their systems. Epic, the largest US supplier, markets a product called Healthy Planet, which uses this data to find information that can be used in public health interventions. The business model is predicated on the introduction of new practices in US healthcare designed to drive down the cost of the system as a whole. Large hospitals are being encouraged to join forces with primary care providers to become Accountable Care Organisations, which are paid not for how much work they do but for how well they meet the health needs of the population, arresting the perverse incentive to do more and more investigations, which is the chief reason US healthcare is so expensive. Cerner, Epic’s main competitor, boasts on its website that its population health management strategy is designed to ‘enable organisations to: know and predict what will happen within a population; engage the person, their family and the care team to take action; and manage outcomes to improve health and care’. In the US, where the employer often foots the bill for healthcare insurance, it isn’t uncommon for employees to be encouraged to join healthy living programmes – which are increasingly built around apps and monitoring devices – designed to lower the cost of premiums. The general goal seems benign, but there is a sleight of hand being performed here. The data – which belongs to the care system – is used by IT companies, which are paid to store it, to generate products for the system to buy back.
In April 2016, New Scientist obtained a copy of a data-sharing agreement between DeepMind, the AI company owned by Google’s parent company, Alphabet, and the Royal Free NHS Trust in London. Under the terms of the agreement, data on 1.6 million patients was to be transferred to a secure data centre and made accessible to DeepMind. The data would include five years’ worth of information on admissions, transfers and discharges, all reports of completed inpatient episodes, all radiology and pathology results, critical care data and A&E data. No patient consent or ethical clearance was sought or obtained. This was justified, the hospital and the company maintained, because the product that DeepMind was developing with the help of the data would directly support patient care. Some observers found this problematic, since Streams, the product being referred to, was a relatively straightforward app that used a published algorithm to identify patients at risk of chronic kidney damage. It didn’t seem to require the kind of AI that DeepMind specialises in. Further grounds for scepticism were supplied by some of the statements made by DeepMind, which clearly intends to deliver something more ambitious than Streams. The case was referred to Fiona Caldicott, the national data guardian, and to the Information Commissioner’s Office. The company’s submission to Caldicott maintained that the data was being used to test their algorithm. Her response was that this testing couldn’t be seen as supporting direct patient care, and that therefore the access they had been given was inappropriate. Shortly afterwards, the Information Commissioner’s Office concluded that the law had been breached.
Neither party has been keen to disclose the financial basis for the deal, but the Royal Free has said that although it currently receives free access to Streams, it will have to pay a ‘service fee’ if DeepMind’s support ends up costing the company more than £15,000 a month. This, presumably, seems a fair agreement to the decision-makers at the Royal Free, since the data has no financial value to the trust and the benefits to patients may be considerable. But from the wider perspective of the NHS it is disappointing: companies are being gifted the raw material that enables them to come up with products for which the NHS looks likely to end up paying dearly.
It’s worth thinking for a moment about what the tech companies may be able to charge for tools of this kind. The obvious target for machine learning in the health sphere is medical image interpretation. DeepMind is rumoured to be close to announcing that it can interpret retinal scans more reliably than expert ophthalmologists. Other companies are looking at mammograms and, I assume, pretty much all other classes of medical images. There are plenty of technical challenges, but most people in the field believe that machine learning will shortly outperform radiologists in at least those aspects of the job that involve the straightforward application of pattern recognition. At the moment, the UK employs 3318 consultant radiologists. The cost of employing them, if you add up their salaries and associated costs, is about £633 million a year. A company that successfully automates even a fraction of this activity will have a huge global market for its wares.
When it comes to the data, this is a buyer’s market. If the NHS tried to get a better deal, tech companies would just look elsewhere. There are lots of countries desperate for their investment. In March 2016, the Italian government signed a deal with IBM. The corporation would invest $150 million, $60 million of which would be provided in state subsidies, in a new centre in Milan to develop data-driven healthcare applications. In return, Il Fatto quotidiano has revealed, IBM expects to have access to the health data of the population of Lombardy, if not the whole of Italy. The paper quotes a confidential document listing the data IBM is hoping to secure: registration and demographic data, historical medical diagnoses, repayments and running costs, medical conditions and procedures, outpatient prescriptions, drug treatments and related costs, emergency room visits, hospital discharge cards, appointment information, time and attendance and other health data. The Italian data protection authority and the European Commission have asked for details of the arrangement, but the government, so far, has ignored them.