Until last summer, hi-tech riots – broadcast on YouTube and organised by BlackBerry – were mostly the preserve of enterprising dissidents in Iran and China. But in June hordes of ice hockey fans in Vancouver, outraged by the local team’s loss to a Boston rival, filmed themselves smashing cars and burning shops. Then it happened here. The crackdowns that follow such riots are equally hi-tech. In both Britain and Canada ordinary members of the public set up Facebook groups to share pictures and videos from the riots, using Twitter to name any identified perpetrators and alert the police. This was cyber-vigilantism at its most creative.
The day after the Vancouver riots, the Insurance Corporation of British Columbia – a state-owned insurance company which also handles drivers’ licences and vehicle registration – offered to help the Vancouver police by running its facial-recognition software on photos from the riots, comparing them with its database, a collection of photos of more than three million individuals, normally used in investigations of fraud and identity theft. Not much came of it: there were no reports of any arrests made thanks to the database. Attempts to automate the process of facial recognition after the British riots failed too: most rioters, after all, didn’t already have their mugshots in police records. Since the UK doesn’t (yet?) have a Canada-style photo database and Canada doesn’t (yet?) have a UK-style CCTV surveillance infrastructure, such efforts in both countries were probably doomed. China and Iran – where excessive surveillance goes hand in hand with excessive documentation requirements and weak or non-existent privacy laws – are a different story. And the technology is improving.
In September 2010, satellite photos of Abbottabad showed a man who looked a lot like Osama bin Laden exercising in a yard; the satellite’s facial recognition system confirmed it was him. It’s said that after shooting him, the Navy Seals ran his picture through another facial recognition system, which reported that there was a 95 per cent chance they had got the right man. Given that half of bin Laden’s face was presumably missing, they must be rather proud of their technology. The Navy Seals may have been using gear similar to the Robocop-style glasses the Brazilian police have developed in preparation for the 2014 World Cup. Fitted with a small camera that sees as far as 12 miles, the glasses can capture 400 images a second and compare them with a central computer database of 13 million faces – or so the police claim. The surest sign that facial recognition technology has made it comes from China, where at last year’s Sex Culture Exhibition in Xi’an a firm called the Love Sex Company presented a £3000 sex doll that speaks in a variety of languages and, thanks to onboard software, can recognise its ‘owner’.
It isn’t easy to teach a computer to recognise a face. Definitions don’t help: if you describe a face as ‘a blob-like region with two eyes, two ears, a nose and a mouth’, you still need to define an eye, a nose, an ear and a mouth. Humans can easily locate a face in a picture even if parts of it aren’t clearly visible; for computers this is very hard. What computers can recognise is the similarity between specific regions in two or more pictures. Given enough computational resources, they can be trained to calculate what a particular segment might look like under certain abnormal conditions – e.g. when the lighting is low or when the person in the photo has aged. As the number of potential differences between any two pictures of the same face is infinite, it’s impossible to write an algorithm that can take account of all such variations. However, even imperfect FRT can be useful.
Suppose you have just photographed a man who claims to be John Smith. How can a computer establish whether he is the same John Smith who exists in your database? First, it needs to find the man’s face in the picture – by looking for blob-like regions with consistent brightness and colour. Then it has to find facial landmarks – nose, mouth, eyes etc (there are more than a hundred significant features). Then the face must be ‘normalised’ by making it look like other images in the database with regard to size, pose, colour intensity and illumination. Finally, the computer has to produce a numerical representation of the face and compare it with the equivalent representation of the picture associated with the John Smith in the database. There are two ways to generate such representations. One is geometric, relying on the shape and position of facial landmarks; the other is photometric, using statistics to distil an image into values.
This kind of verification exercise is one of the simplest tasks in automated facial recognition. But it would be of little use to police investigators after a riot or demonstration. All they have are the photos and footage they shot of protesters, to match against their database of pictures taken at previous protests. They don’t even know if a given rioter is in their database, so if the computer doesn’t find any matches, it’s hard to say whether it has made an error or the matching face is indeed missing. The investigators’ best hope is to generate a similarity score between the new photo and photos in the database, by comparing its mathematical representation with those of previous images. At this point, it may be safest to have a human operator decide whether the face in the new photo actually matches any of the possible candidates in the database. To achieve full automation – to outsource this judgment to the computer completely – would require deciding on an acceptable threshold of error, of which there are two kinds: false positives and false negatives. A high false positive rate means too many innocent suspects having to explain themselves; a high false negative rate means too many actual rioters would be let off the hook. False positives are common in facial recognition; one recent case in the US involved a driver who had to spend ten days wrangling with the authorities after a system used by the Massachusetts Registry of Motor Vehicles mistook him for another driver and revoked his licence.
Given its spotty track record, it’s hard to see why facial recognition technology has so quickly become one of the most widely used forms of biometrics (second only to fingerprints). Kelly Gates’s Our Biometric Future, a thorough exploration of FRT’s relatively short history, provides some clues. Compared to other biometric technologies, FRT has one enormous advantage – it doesn’t require consent, co-operation or even the subject’s knowledge – and many smaller ones. Unlike fingerprinting, it has no criminal connotations. Hand geometry, sometimes suggested as an alternative to fingerprinting, is unreliable, as hand measurements are not unique to individuals. Voice recognition has a significant drawback too – our voices change quite often – while retinal scanning triggers unfounded fears that one’s eyesight may be damaged.
‘The banality of the portrait’, as Gates puts it, has also helped. FRT relies on a ubiquitous medium – photography – that has been part of bureaucratic identification schemes for more than a century (the idea of using images of faces as tokens of identity dates back to the mid-1850s; the first photographic passports were issued around the time of the First World War). The work of Alphonse Bertillon, a police official working in Paris from the 1880s onwards, helped lay the ground for modern FRT. Unlike the eugenicist Francis Galton or the criminologist Cesare Lombroso, who believed it was possible to read a person’s criminal type off his face, Bertillon was mundanely preoccupied with identifying criminals by recording their bodily measurements and taking mugshots under controlled lighting conditions. To that end, he developed a sophisticated system of measurements – Bertillonage. Bertillon’s standardised mugshot was used by police worldwide, but the absence of efficient indexing and the decreasing costs of photography created unanticipated problems. Eventually, there were too many photos to search, organise and analyse. The advent of applied computing in the 1950s promised to change all that. Computers – in theory – could solve the problems that plagued Bertillonage by automating the process.
Woodrow Wilson Bledsoe, a pioneer of artificial intelligence, conducted one of the first experiments with computer-based facial recognition in 1964 (he had already done some significant work on text recognition). He had a human operator mark important facial landmarks on a set of two thousand pictures containing at least two separate images of each test subject. This produced a list of twenty distances for each face – width of mouth, pupil-to-pupil distance between the eyes etc – which were entered into a database next to the subject’s name. The computer was then given a list of distances for a new image of one of the subjects and prompted to find a match. Bledsoe grasped the challenge involved in automating facial recognition: the greater the variation between the compared images, the worse the system’s performance. FRT is particularly sensitive to differences in illumination; shadows and intrusive backgrounds are hard to process. Other variations abound too: people age, grow beards, use make-up or simply turn their heads away from the camera.
Despite a few minor breakthroughs in the early 1970s computer scientists came to accept that there would be no great improvement in FRT until cheaper computing power, better algorithms and higher-quality images were available. As the whole project of artificial intelligence was increasingly put in question – by computer scientists among others – a less ambitious goal was settled on. It may have been preposterous to think that computers could be taught to ‘see’ like humans, but there were still plenty of ways to profit from what Gates calls ‘human-computer divisions of perceptual labour’. In the 1980s the loose collective of companies and academics working in the field of ‘automated personal identification’ – or biometrics, as it became known – acquired the markings of a fully fledged industry, with its own conventions, associations and newsletters. The US government was its one and only godparent, defining technology standards, handing out lucrative tenders and subsidising research. The first meeting of the Biometric Consortium – a group set up with the aim of fostering closer ties between the government and industry – was organised in 1992 by the research division of the National Security Agency.
Various defence and intelligence agencies funded most of the early work in the field, including Bledsoe’s experiments. The situation hasn’t changed: the FBI is funding a system that can distinguish between the faces of identical twins, while In-Q-Tel, the CIA’s venture capital arm, has also been a significant supporter of FRT. But the most important government contribution to the commercialisation of the technology was administering tests to evaluate the viability of using it for real-world purposes. The tests were first conducted in 1993 and are still held every few years. The results were soon thought to be good enough for FRT vendors to branch out of the defence industry. The public sector was identified as an important target, since, according to the CEO of one FRT vendor, ‘that’s where the money is and the faces are.’ Agencies operating large-scale identification systems – the State and Justice Departments, individual states’ Departments of Motor Vehicles, police departments and prisons – were the unlucky guinea pigs. The systems they bought rarely lived up to the modest promises of the official tests, let alone to the vendors’ overblown claims. (‘In the future,’ one company promised, ‘facial recognition systems could allow drivers to renew their licences at an unattended kiosk in local supermarkets.’)
The possibility of integrating FRT with close-circuit television cameras – ‘smart CCTV’ – brought on even more hyperbole. One company announced a product that ‘revolutionises the functionality of conventional CCTV’, providing ‘active, real-time identification for today’s passive CCTV systems’. But reality didn’t match the marketing brochures. In 1998 smart CCTV technology was installed in the London borough of Newham. It was superior to humans in many respects: its eyes never got tired and, as the manufacturer pointed out, ‘it never goes to the loo, either.’ Whether the system actually worked seemed to be of secondary importance to the Newham police; according to Newham’s security chief, ‘the need was to reduce the public fear of becoming a victim of crime and increase the criminals’ perception of the chance they would be detected.’ Six years later, Newham’s smart CCTV still hadn’t made any positive identifications, and it failed to spot a Guardian journalist whose picture was in the database and who walked around in front of the cameras in two zones covered by the system. The crime rate in the area had dropped but not because criminals genuinely had anything to fear. A smart CCTV experiment in Tampa, Florida in 2001 brought similar results: no arrests were made and the system was scrapped after only two years of operation. This time the crime rate didn’t drop.
Why were people so ready to believe what the FRT vendors claimed? Perhaps because these companies were brimming with PhD-carrying scientists who, sensing that the US government was getting interested in their field, didn’t hesitate to switch to more lucrative careers. The archetypal figure of scientist turned entrepreneur here is Joseph Atick. In 1994, Atick – along with two other researchers from the Computational Neuroscience Laboratory at Rockefeller University – formed Visionics Corporation; Atick was the CEO. He made the most of his impressive credentials: a child prodigy, he was known to have written a 600-page physics textbook at the age of 16 and earned a PhD from Stanford at 21. But even stars like Atick couldn’t put a positive spin on failures like Tampa. And privacy advocates were finally mobilising against any new deployments of smart CCTV. At that rate, FRT might have died of natural causes just a few years later.
Then came 9/11, presenting FRT vendors with a once-in-a-lifetime marketing opportunity. On 24 September 2001 – two weeks after the attacks – Visionics released a brochure called ‘Protecting Civilisation from the Faces of Terror’, which presented automated facial recognition as a fully functioning technology that should be integrated into airport security systems. Visionics’s technology, the brochure claimed, would allow security officials to ‘rapidly, and in an automated manner, use video feeds from an unlimited number of cameras and search all faces against databases created from various intelligence sources and formats’. On 1 October Atick appeared on CNN: ‘Terror,’ he proclaimed, ‘is not faceless.’ Had systems like his been installed in airports, he said, ‘I can’t help but imagine that we could have identified and intercepted at least some of these [terrorists].’ ‘The faces of terror’ was a popular trope in post-9/11 America. When he announced the FBI’s ‘most wanted terrorists’ list on 10 October, Bush used one of Atick’s talking points: ‘Terrorism has a face, and today we expose it for the world to see.’ A Harris poll taken shortly afterwards found that 86 per cent of respondents favoured ‘the use of facial recognition technology to scan for suspected terrorists at various locations and public events’; six months later, that number still stood at 81 per cent.
Another technology was also on the rise: automated analysis of facial expressions. The system is based on the premise that all emotions trigger facial movements that give the game away. A sophisticated classification system – called Facial Action Coding System – developed in the 1970s by the psychologists Paul Ekman and Wallace Friesen helps to translate facial movements into corresponding emotions. It is currently in use in American airports under a programme called Screening Passengers by Observation Techniques (SPOT): specially trained officers look out for passengers exhibiting abnormal facial expressions and bodily signs. Ekman’s system was designed to be used by human operators, but humans are expensive: training one observer takes more than a hundred hours and to mark up one minute of video according to the system takes about an hour of manual coding. Ekman now wants to replace human operators with machines, and his work has received funding from the National Science Foundation and the CIA. In 2006 he predicted in the Washington Post that ‘within the next year or two, maybe sooner, it will be possible to program surveillance cameras hooked to computers that spit out FACS data to identify anyone whose facial expressions are different from the previous two dozen people’. It hasn’t happened yet, but not for want of money being poured into it.
The wars in Afghanistan and Iraq have been a further blessing for the biometric industry. The need to identify local populations which the occupying armies had little understanding of led to the deployment of portable systems that work with multiple biometrics – such as fingerprint, face and iris – and so are more resistant to fraud. As is typical of innovations produced by the war on terror, the idea of portable biometrics has attracted much interest from law enforcement agencies back in America: police in Massachusetts are already using a biometric system to check people’s identities by taking photos on a slightly modified iPhone. Thanks to 9/11 and the two wars, the market for biometrics has been growing handsomely. Its total size in 2010 was around $4.2 billion, compared with $395 million in 2000. Not all of this money – much of it government money – has been wasted: today’s FRT is far more reliable than that of ten years ago. The 2006 industry test found that the new algorithms were ten times more accurate than those of 2002 and a hundred times more accurate than those of 1995. Some algorithms outperformed humans – especially in telling identical twins apart.
Conventional facial recognition can now also be combined with newer methods such as skin-texture analysis (a patch of skin is captured, then broken into smaller blocks and converted into mathematical space) and 3D facial recognition (which captures information about the shape of the skull and so is immune to changes in lighting). These hybrid systems do much better than either technology when used by itself. Face hallucination, another novel technique, allows a computer to ‘guess’ what a low-resolution picture of the face (or its missing parts) may look like in higher resolution. Other new systems allow for better handling of wrinkles. Casinos – which seek to ban cheats from their premises – have been trying to make FRT work in low light with the help of infrared technology. Scientists at UCLA – with funding from the Chinese government – have built an ‘image to text’ system that automatically produces text summaries of what is taking place in captured video. It means that CCTV footage can easily be searched, in China’s case boosting its sprawling complex of video surveillance.
For all the innovations, there have been few successful real-world deployments of facial recognition. Failure is still far more common. Last year Manchester Airport shut down its facial-recognition scanners after the robot guard let through a couple who had swapped passports. A 2009 research paper found that heavy plastic surgery reduced the success rate of facial recognition systems to just 2 per cent. So why is FRT so ubiquitous when it performs so poorly? The reason is that FRT doesn’t have to be perfect to be useful. For many purposes, having a computer that can guess a person’s gender or age by looking at their face is good enough. This means advertising hoardings that change their ads depending on who – a teenage girl or a middle-aged man – stands in front of them; vending machines that tell you what fizzy drink to buy based on what people who look like you are buying; web services that match the faces of abandoned dogs to those of humans looking for canine companions; car prototypes that fasten your seatbelt and start beeping at you when they suspect you might be drunk or are about to doze off. (Much of this sci-fi stuff originates in Japan, where a company called Omron is selling ‘smile-scan’ technology that allows service industry firms to evaluate the quality of their employees’ smiles.) The Cowcam application developed at the University of Queensland uses FRT-like technology to separate farm animals on their way to water: cattle are good to go while goats and pigs are barred. Leafsnap, a mobile app launched by the Smithsonian Institution, uses FRT to recognise photos of leaves and load detailed information about the leaf’s parent tree onto your iPhone. What self-respecting hipster wouldn’t want a mobile app that tells them the gender ratio – computed in real time – at their favourite bar, based on the pictures gathered by cameras installed at the bar’s entrance and exit? (That would be the popular SceneTap app.) What PowerPoint aficionado wouldn’t want to try teleconferencing software from Alcatel-Lucent that sends a warning when people on the other end of a conference call start looking bored?
Many companies seek to capitalise on the aura of ‘cool’ around such technologies and leverage the intelligence of human users to improve their system’s ability to recognise objects and faces. Until recently Google ran an online game called Image Labeller, in which people competed to find words to describe a picture and got points if their descriptions matched. Everyone wins: humans are mildly diverted while Google generates descriptions for the images it crawls. But Gates worries that projects like this allow institutional users of FRT – security agencies, governments and corporations – to encourage users ‘to contribute their free labour and technical skills to the process of developing more effective forms of image retrieval’. Social networks and photo-sharing sites can be ‘test beds for defining new social uses of facial recognition technology and optimising its functionality’. Gates’s impressive book would have been even stronger had she fully addressed the likely future impact of companies like Facebook, Google and Apple on the industry. The iPhone is a powerful biometric technology: several mobile apps are already capable of tagging photos of one’s friends on the go. SocialCamera, one such app, can even be trained: the more pictures of a friend you tag, the fewer mistakes it makes in the future.
Facebook has an even bigger advantage in FRT: the enormous number of photos it handles (four billion pictures are uploaded to the site every month). Since it knows who your friends are, Facebook can predict the names of people who are likely to appear in your photos; since it knows where you study, live, work and travel, it can predict the most likely backgrounds of your photos. These bits of data – along with your age, gender, sexual orientation and a heap of other facts – may help it build the ultimate facial recognition system or, at least, the ultimate face search engine. Google, too, could revolutionise the field if it chose to. Eric Schmidt, Google’s executive chairman, claimed to find the technology ‘creepy’, but his company has nevertheless acquired several start-ups specialising in various forms of visual recognition. (‘Technically, we can pretty much do all of these things,’ Google’s top image recognition engineer told CNN last year.) Google has also secured several valuable patents, including one to boost the accuracy of facial recognition by tapping into the data from social networks. Last December the company made a major move into the field by introducing some basic FRT functionality to its Google Plus social networking site. If Facebook succeeds in convincing the public that FRT is OK, Google is likely to act too. What was once the stuff of civil libertarians’ nightmares – the integration of one spooky technology (facial recognition) with another (data-mining) – may soon become a reality. It won’t be long before Facebook, Google and others unleash such services on consumers, wrapping them in ‘user empowerment’ rhetoric.
Send Letters To:
London Review of Books,
28 Little Russell Street
London, WC1A 2HN
Please include name, address, and a telephone number.