Fire Underwater

Rachel Connolly · Unsupervised Machine Learning

Which face is real? *

Earlier this year, researchers at the Silicon Valley charity OpenAI announced a new Artificial Intelligence system, GPT-2, that can finish people’s sentences. GPT-2 has studied text on more than eight million web pages and learned to predict the most likely next word in any given sentence. It uses this method to write whole paragraphs. The resulting text is relatively coherent but, as the researchers note, far from perfect. Word repetition is one problem; describing the impossible (such as a fire underwater) is another; and sentences are prone to strange topic changes.

A few weeks later, two academics at the University of Washington invented a game called Which face is real? Players are presented with one picture of a real human face and another of a fake, AI-generated face and must guess which is which. Again, it is good but not perfect. StyleGAN, the AI behind Which face is real? (developed by researchers at the technology company Nvidia), was trained on several hundred thousand close-ups of faces. After a few goes, you get an idea of what tics to look for: it prefers to render bland backgrounds, it avoids fancy earrings and it sometimes generates faces with a distinctly unnatural asymmetry. My success rate, looking closely, was about 80 per cent on the first try and 100 per cent on the next five.

Both GPT-2 and StyleGAN are applications of a type of artificial intelligence known as unsupervised machine learning, in which a network learns from an unlabelled dataset. This represents a significant advance on the supervised learning networks that form the basis of many other text and image-processing AI systems, since they require manually labelled datasets. As both sets of researchers note in their papers, these generators are a preliminary step in using unsupervised learning in the fields of word and image processing.

But that wasn’t exciting enough for much of the media. It was reported that StyleGAN could invent a fake perpetrator for a terrorist attack (StyleGAN cannot generate a complete body, or even a torso). OpenAI delayed releasing the full open source GPT-2 model, citing ‘concerns about malicious applications’: the word ‘malicious’ dominated headlines, above articles declaring that GPT-2 was dangerous, capable of making ‘deep-fakes for text’ and generating an ‘army of fake news’. One columnist in a national newspaper announced that if GPT-2 were released we would be ‘hurtling towards the cheering apocalypse’.

‘It will take maybe two to four hours for bad actors to find out where it’s drawing from and start seeding it with propaganda,’ another journalist wrote. The GPT-2 paper clearly explains that it ‘draws from’ the internet forum Reddit. Or, rather, from a static database made up of articles and websites linked to by Reddit users. It is therefore impossible to ‘seed’ GPT-2 because, like most AIs, its training process is complete before it starts working. A few weeks after announcing GPT-2, Open AI announced it was going to start raising money from investors as a new 'capped profit' company. The free publicity from all the ‘malicious applications’ stories won't have hurt.

Unlike people, AIs don’t keep learning but must be retrained to improve. The GPT-2 database uses articles posted up to the end of 2017, and performs best on such topics as Brexit and Miley Cyrus, generating convincing sentences around 50 per cent of the time. For more left-field topics, the rate is much worse. To remain current and useful it would need to be regularly retrained; this is not an inconsequential restriction.

Another weakness is the unavoidable fact that, while people learn from the real world, software learns from databases. StyleGAN used two datasets: CelebA-HQ, which contains more than 200,000 photos of celebrities, and FFHQ, which contains around 70,000 photos taken from Flickr, a hosting site used mainly by professional photographers. That the average celebrity face does not represent the average human face is a point that hardly needs making, and the similar biases in the photos on Flickr were acknowledged by the researchers in the StyleGAN paper.

Bias in datasets is unavoidable: a finite database, by definition, represents only a subset of a thing that exists in the real world. Most of the information on the internet is mainly used and uploaded by relatively wealthy people. Photos on social media sites skew towards young people, and most of the text scraped to train GPT-2 was written by men. Algorithms are also biased towards learning English.

The only way to minimise these biases would be to capture data that represents the real world more accurately. This could be done, by recording candid images of people going about their lives, the way they talk, the way they write in private forums – by a level of surveillance, in other words, far beyond what many people already consider too intrusive. This means there is no clear and practical way to make a perfect dataset to ‘feed’ an AI, which in turn means there is no way to make a perfect AI.

* The face on the left is the real one.


  • 16 April 2019 at 1:26pm
    Guernican says:
    Thank you for this.

    It occurs to me that popular culture very rarely presents these things as positives. When we see robots or AI depicted on screen,they're nearly always either malfunctioning in violent ways or designed from the ground up to eviscerate us or send us to extinction. For a lazy journalist, that's the shortest synaptic connection. It's AI. It must be sinister.

    It's always struck me as interesting that the current accepted test for machine intelligence is whether or not it can sound human enough to fool a human. Were we to create a machine that could genuinely think and learn for itself, why on earth would it think like a human?

  • 16 April 2019 at 6:09pm
    Timothy Rogers says:
    Rachel Connolly’s article brings the reader back to earth in the discussion of the promise and perils of AI. On the one side we are bombarded by techno-utopians who believe in things like the “Singularity” and ”machine immortality” as a future repository of human consciousness. Some of that is just wishful thinking coupled to the natural fear of death, while the rest is hype designed to convince investors of “the next big thing” (worthy of your money). The middle ground is that we know from experience a great deal of AI is useful in circumscribed areas, and a great deal of it useless for anything other than commercial purposes, many of which are themselves entirely trivial (thus the profusion of apps to solve problems that aren't really problems, but are somehow "cute" or promise efficiency in the self-obsessed, curated life of AI and gadget lovers.)

    Guernican’s comments open up an interesting question. He or she thinks that a genuinely self-instructing machine would wind up not thinking like a human. The devil is in the details here. It might be useful to think of sophisticated AI machines (and their programs) as falling in a range that encompasses the broadest possible variety of earthly organisms and the way they think (and feel). I think we now know that many mammals think like us in some respects (but without words) and share some basic feelings with us (fear, rage, and anxiety come to mind). Behavioral neuroscience depends on these likenesses across species for investigations of how psychoactive drugs work – sometimes the analogy from one species to another is good, at others disappointing (because we are still floundering in a sea of ignorance about such matters, and only making incremental and halting progress toward a better understanding of consciousness – be it that of lab rat or human). An we can see that even :lowly" arthropods can solve problems and make choices, implying "thought" of some kind (built out of sensory capacities plus memory). AI, having been created by humans, would invariably think in most respects like humans – its learning datasets are compiled by humans, the problems it solves are framed by humans, etc.; it uses two “languages” created by humans, the verbal one of everyday discourse and mathematics; and, machine logic is merely derivative of the varieties of human logic. What kind of dataset and learning instructions would be able to move it out of this orbit?