Scraping Faces

Rachel Connolly

The Metropolitan Police has announced it is going to use Live Facial Recognition (LFR) in London. The controversial technique, trialled in South Wales since 2017, involves officers sitting in a public place and filming the people who walk past. Their faces are automatically compared to pictures in a database of wanted criminals and the police are alerted if there is a match. If there is no match, the Met says, ‘the biometric data is immediately, automatically deleted’. A few days earlier, the New York Times reported that a company called ClearView AI has developed a facial recognition tool that allows law enforcement agencies in the US to match images or video footage with photos from the internet.

Both developments suggest it will be difficult to avoid facial recognition software as the surveillance technology becomes more popular with law enforcement. The Met has said it will give advance warning of where and when it’s using LFR, and that anyone can refuse to walk past the cameras. But it’s difficult to see how this wouldn’t interfere with their using it to catch criminals. ClearView AI has said it will, under certain conditions, remove images of those who do not want to be included in its database, but it’s difficult to see how this will work in practice.

ClearView AI’s database consists of photos scraped from websites like Facebook, YouTube, Twitter and Instagram. When the tool matches footage to a photo in the database, it brings up a link to the site where the photo was found. When a site is scraped on a large scale, the company may detect it, but the people whose profiles are being scraped will not be alerted. Most of the people whose photos are in ClearView AI’s database will never find out their image has been collected unless the FBI, Homeland Security or police contact them about a crime.

Scraping is prohibited by most websites’ terms of service agreements, but there is no law against it and ClearView AI hasn’t faced any consequences yet. Last September, LinkedIn lost a court case against a company that was scraping publicly available data from its website. ClearView AI’s database may be unsettling, but the company hasn’t really done anything new.

Facial recognition software designed by companies that don’t already hold large datasets of faces (as Facebook and Apple do) is usually trained using sets of scraped photos. Many of them are freely available on the internet for anyone to download. VGGFace2 has more than three million faces scraped from Google image search; the Google facial expression comparison dataset has more than 150,000 different faces scraped from unnamed websites; Flickr-Faces-HQ has 70,000 images scraped from Flickr. If you found your photo in one of these databases you could contact the owner of the set to have it taken down, but there is no easy way of removing it from the computers of everyone who has downloaded and used it already.

ClearView AI’s founder, Hoan Ton-That, told the New York Times that his company only uses publicly available images, so nothing from private accounts would end up in his database. But if your profile has already been scraped, changing your privacy settings now won’t help. He said nothing about how the company will deal with photos uploaded by third parties to accounts that are not private. Even if you are meticulous about your online privacy, the company may still harvest your image from friends who are not.

Ton-That also said his company is working on a tool to let people request images be removed from its database. But it took the New York Times reporter a month to get in touch with him, and she only managed to do so by turning up at his office. It’s hard to imagine that the average person will find it straightforward to lodge a complaint. And since the database is available only to law enforcement agencies, how would we even know if our pictures were in it?

Reading about ClearView AI, I was reminded of a conversation from more than ten years ago with a family friend whose son was being bullied via a fake Facebook page set up by other teenagers. He was trying to do something about it, but having difficulty communicating with Facebook: ‘It’s impossible to get somebody from that company on the phone.’


  • 31 January 2020 at 4:40pm
    staberinde says:
    Great article.

    The key issue is over-reach. It's fine to train the expert system on publicly available pictures of real people, but it's not OK for the deployed system to compare everyone it sees to everyone it has ever seen. It should instead be doing what the police officer would do, which is to compare everyone they see with the photographs of criminals at-large.

    And if the argument is "It's not built like that," the answer is "Well go away and rebuilt it right."

  • 2 February 2020 at 1:40pm
    Graucho says:
    One has wondered with a shudder how things would have turned out if Hitler or Stalin had had all the recent advances in IT and AI to hand. Watching developments in China we are probably going to find out.

    • 4 February 2020 at 12:57pm
      Reader says: @ Graucho
      Hitler did pretty well with the Gestapo and the system of institutionalized surveillance going all the way down to block wardens. With Stalin, the aim seems to have been to create an atmosphere of terror. Part of that is the very randomness and uncertainty involved in not knowing whether you will be next for the 4am knock on the door, even if you are innocent.