Earlier this year, researchers at the Silicon Valley charity OpenAI announced a new Artificial Intelligence system, GPT-2, that can finish people’s sentences. GPT-2 has studied text on more than eight million web pages and learned to predict the most likely next word in any given sentence. It uses this method to write whole paragraphs. The resulting text is relatively coherent but, as the researchers note, far from perfect. Word repetition is one problem; describing the impossible (such as a fire underwater) is another; and sentences are prone to strange topic changes.
A few weeks later, two academics at the University of Washington invented a game called Which face is real? Players are presented with one picture of a real human face and another of a fake, AI-generated face and must guess which is which. Again, it is good but not perfect. StyleGAN, the AI behind Which face is real? (developed by researchers at the technology company Nvidia), was trained on several hundred thousand close-ups of faces. After a few goes, you get an idea of what tics to look for: it prefers to render bland backgrounds, it avoids fancy earrings and it sometimes generates faces with a distinctly unnatural asymmetry. My success rate, looking closely, was about 80 per cent on the first try and 100 per cent on the next five.
Both GPT-2 and StyleGAN are applications of a type of artificial intelligence known as unsupervised machine learning, in which a network learns from an unlabelled dataset. This represents a significant advance on the supervised learning networks that form the basis of many other text and image-processing AI systems, since they require manually labelled datasets. As both sets of researchers note in their papers, these generators are a preliminary step in using unsupervised learning in the fields of word and image processing.
But that wasn’t exciting enough for much of the media. It was reported that StyleGAN could invent a fake perpetrator for a terrorist attack (StyleGAN cannot generate a complete body, or even a torso). OpenAI delayed releasing the full open source GPT-2 model, citing ‘concerns about malicious applications’: the word ‘malicious’ dominated headlines, above articles declaring that GPT-2 was dangerous, capable of making ‘deep-fakes for text’ and generating an ‘army of fake news’. One columnist in a national newspaper announced that if GPT-2 were released we would be ‘hurtling towards the cheering apocalypse’.
‘It will take maybe two to four hours for bad actors to find out where it’s drawing from and start seeding it with propaganda,’ another journalist wrote. The GPT-2 paper clearly explains that it ‘draws from’ the internet forum Reddit. Or, rather, from a static database made up of articles and websites linked to by Reddit users. It is therefore impossible to ‘seed’ GPT-2 because, like most AIs, its training process is complete before it starts working. A few weeks after announcing GPT-2, Open AI announced it was going to start raising money from investors as a new 'capped profit' company. The free publicity from all the ‘malicious applications’ stories won't have hurt.
Unlike people, AIs don’t keep learning but must be retrained to improve. The GPT-2 database uses articles posted up to the end of 2017, and performs best on such topics as Brexit and Miley Cyrus, generating convincing sentences around 50 per cent of the time. For more left-field topics, the rate is much worse. To remain current and useful it would need to be regularly retrained; this is not an inconsequential restriction.
Another weakness is the unavoidable fact that, while people learn from the real world, software learns from databases. StyleGAN used two datasets: CelebA-HQ, which contains more than 200,000 photos of celebrities, and FFHQ, which contains around 70,000 photos taken from Flickr, a hosting site used mainly by professional photographers. That the average celebrity face does not represent the average human face is a point that hardly needs making, and the similar biases in the photos on Flickr were acknowledged by the researchers in the StyleGAN paper.
Bias in datasets is unavoidable: a finite database, by definition, represents only a subset of a thing that exists in the real world. Most of the information on the internet is mainly used and uploaded by relatively wealthy people. Photos on social media sites skew towards young people, and most of the text scraped to train GPT-2 was written by men. Algorithms are also biased towards learning English.
The only way to minimise these biases would be to capture data that represents the real world more accurately. This could be done, by recording candid images of people going about their lives, the way they talk, the way they write in private forums – by a level of surveillance, in other words, far beyond what many people already consider too intrusive. This means there is no clear and practical way to make a perfect dataset to ‘feed’ an AI, which in turn means there is no way to make a perfect AI.
* The face on the left is the real one.