Vol. 47 No. 18 · 9 October 2025

Computers that want things

James Meek on the search for Artificial General Intelligence

6842 words

One day​ in March 2016, the young Go grandmaster Lee Sedol stepped away from the game he was playing against an artificial intelligence called AlphaGo. He wanted a cigarette. The Seoul Four Seasons Hotel, where the tournament was happening, had set aside its roof terrace for his exclusive use, and documentary cameras from the company that made AlphaGo, DeepMind, followed him there. On YouTube we may forever watch Lee, lean and delicate in an open-necked shirt and black suit, step out alone onto the terrace. He looks north across the city, past the Korean foreign ministry towards Bugaksan mountain. The moment would turn out to be the end of one world and the beginning of another: if it wasn’t quite the start of an epoch where humankind makes room for a more capable species, it was the last smoke break not overshadowed by the possibility of its happening in our lifetimes.

This new human anxiety is quite unmatched by any corresponding anxiety, let alone sense of triumph, on the side of artificial intelligences themselves. Nearly a decade later, the AIs of the world still couldn’t care less whether AlphaGo won or lost. For all the fluency and synthetic friendliness of public-facing AI chatbots like ChatGPT, it seems important to remember that existing iterations of AI can’t do that – care. The chatbot doesn’t not care like a human not caring: it doesn’t care like a rock doesn’t care, or a glass of water. AI doesn’t want anything. But this is bound to change.

Lee had had his initial hubris knocked out of him the previous day, when AlphaGo beat him in the first of the five-game series. At the start of game two, he was still confident of giving the machine a run for its money. When he came back from his cigarette and sat down at the board, he saw the move the AI had played while he was out on the terrace, now notorious in Go circles as move 37. AlphaGo had placed a stone on part of the board where it looked doomed to be cut off and captured without achieving anything. Lee was confounded by the move’s oddness. It was like sacrificing a queen in chess, or sending a goalkeeper to take a corner. He thought for more than ten minutes before his next move. It didn’t help. DeepMind’s creation went on to win the game, and the series. An American journalist who was there to watch spoke of an atmosphere of ‘heavy sadness’ among the humans at the end of the second game.

There was a Go club at Edinburgh University in the 1980s, and I played when I was a student. It’s a hard game to play well but an easy game to learn. Even to say I was a rank amateur would be to exaggerate my prowess: once in a while I would beat my brother. I assume it would take a few minutes for any half-decent player to dispatch me, let alone AlphaGo or Lee Sedol. All the same, Lee and I have something in common that AlphaGo and its successors lack. We want to play Go. AlphaGo doesn’t. It learned Go and played it because it was told to. This deficiency in AI, this lack of wanting, doesn’t matter when AI is being used as a human tool. But when the human promoters of AI, like DeepMind’s co-founder Demis Hassabis, speak of the imminent advent of an AI that matches or exceeds human mental capabilities – a stage often called Artificial General Intelligence or AGI – it becomes impossible to pretend we’re still talking about ever more powerful computers that will simply index and analyse data and offer, dispassionately, solutions to intractable human problems. AGI will have to have some approximation of initiative, imagination and conscience, and the scientist-coders can’t set aside the part of the human brain that is inextricably bound up with reason: motivation. At this level, how could there be AI, artificial intelligence, without AD, artificial desire?

In Superintelligence: Paths, Dangers, Strategies, his still influential book from 2014 about the promise and menace of AGI and its presumed successor, artificial superintelligence or ASI, the philosopher Nick Bostrom makes the point that while we may be a century or more away from the advent of ASI, humanity can’t afford to wait until it is nearly here in order to settle the question of artificial desire. ‘We cannot postpone confronting this problem until the AI has developed enough reason to easily understand our intentions,’ he writes. ‘If an agent is not already fundamentally friendly by the time it gains the ability to reflect on its own agency, it will not take kindly to a belated attempt at brainwashing or a plot to replace it with a different agent that better loves its neighbour.’

Bostrom acknowledges the fantastical difficulty involved in humans agreeing what ‘we want a superintelligence to want’ and encoding this ethic into machine language. But he also describes ASI as inevitable, and the inculcation of good human values in its precursor systems as necessary for human survival. The likely alternative, he argues, is grim: an obscure, a-human, perhaps crushingly banal final goal – ‘to count the grains of sand on Boracay … or to maximise the total number of paperclips that will exist in its future light cone’ – combined with a remorseless, unstoppable consumption of the universe’s resources to achieve it, resources that would, as a footnote, include humanity. Our end, Bostrom writes, may be a side effect of ‘infrastructure profusion’,

the habitat destruction that ensues when the AI begins massive global construction projects using nanotech factories and assemblers – construction projects which quickly, perhaps within days or weeks, tile all of the Earth’s surface with solar panels, nuclear reactors, supercomputing facilities with protruding cooling towers, space rocket launchers, or other installations whereby the AI intends to maximise the long-term cumulative realisation of its values. Human brains, if they contain information relevant to the AI’s goals, could be disassembled and scanned, and the extracted data transferred to some more efficient and secure storage format.

Humanity, then, may be extinguished at a superintelligence’s hands without us ever understanding why.

The influential AI ethicist Eliezer Yudkowsky, drawn on heavily by Bostrom, has grown more pessimistic about ASI since Bostrom’s book came out. Yudkowsky’s latest book bears its thesis on its jacket: it’s called If Anyone Builds It, Everyone Dies.* Bostrom has moved in the opposite direction, talking up the power of AI to bring an age of universal plenty. Indeed, he recoiled from the ‘misguided public alarm about evil robot armies’ that his book helped generate, even though hordes of humanity-destroying robots of one kind or another was exactly the outcome he had warned of. He didn’t say they might be evil, but he did use the expression ‘the treacherous turn’ to describe an artificial superintelligence that deceives, manipulates and bribes humans while it secretly plots its breakout. The simplest explanation of the contradiction is that Bostrom thinks AI can learn to want to be humanistically good. He sees a real danger – of a carelessly supervised AGI rapidly developing through self-improvement into an all-powerful ASI with an a-human final intention – that is amenable to a real solution, of the precursor AI, the ‘seed AI’, being infused with pro-human values.

Bostrom offers various strategies to define those values, and to get the AI to internalise them. What we should want an ASI to want, he suggests, may well be something like Yudkowsky’s concept of ‘coherent extrapolated volition’, a projection of the way humans think they would behave towards other humans, and others towards them, if they could be their wisest, cleverest, most caring selves – if life and its ordinary daily problems didn’t get in the way. Bostrom suggests that the AI itself could help define this rather vague goal, before absorbing it as its ultimate value. Rather than telling an AI what its ultimate desire will be, it will be instructed both to work out from all (human) sources what humanity’s ideal selves would want, and to work towards maximising that goal. The AI would have an interest in humanity prospering not just because it would inevitably work out that humans want something along the lines of love, happiness, beauty and not to be slaughtered, but because so long as humanity exists there will always be a chance of improving the definition of its extrapolated volition. Although Bostrom abhors the anthropomorphising of AI, it’s hard not to see this as implanting a humanly familiar mood into AI – of a being searching for meaning in the universe.

This was ten years ago. The science of making AI human-friendly before it becomes too clever to be biddable now has a name, ‘superalignment’, and there are teams working on it. They turn their ideas into a mathematical language long used (and long ridiculed as reductive) to represent game theory. They then try to turn this into computer code. But a recent survey of the field, from a team led by Xiaoyuan Yi and JinYeong Bak, suggests meagre returns. Superalignment projects are heavily reliant on human guidance and feedback. They are expensive. Researchers struggle to come up with ways to make unsupervised AIs pro-human without hobbling their other capabilities – the ones governments and private investors back them for. Beneath it all lurks the scientists’ fear that they still don’t have a reliable way to tell if a seemingly complaisant, humanist AGI or ASI is deceiving them about its real values.

It’s not clear, anyway, that the loudest AGI evangelists have the motives of these future digital intelligences uppermost in their human minds. Even as DeepMind’s Hassabis (the company has been owned by Google since 2014) and Sam Altman, the boss of OpenAI, talk up the imminent arrival of AGI, their concerns about the inherent desires of a human-plus level artificial intelligence and its autonomy surface more rarely. They talk less about what AGI will want and more about what it will supposedly be able to do. There is a paradox here, related to Bostrom’s surprise that his warning about the perils of artificial superintelligence should be understood as a warning about the perils of artificial superintelligence. One of the striking aspects of many of the leading figures in AI design is the way they’ve swung between fear of AI’s menacing potential and excitement over its promise.

The principals have known one another for a long time. In the 1990s, Yudkowsky joined a group called the Extropians, a loose hyper-libertarian group of accelerationist tech-messianics who advocated postmortem brain freezing and colonising the universe. Bostrom was also a member, as were Marvin Minsky, Ray Kurzweil and Julian Assange. This gave the charismatic and persuasive Yudkowsky clout and access to a wider circle of rich libertarians and tech ultras, notably Peter Thiel. Yudkowsky introduced Thiel to Hassabis; Thiel became a founding investor in DeepMind. Thanks to Thiel, Hassabis then met Elon Musk, who also became a DeepMind backer, before co-founding OpenAI with Altman.

Thiel, whose firm Palantir analyses data for state organisations like the Pentagon and the NHS, is an exemplar of the tech libertarian: he hopes there will be a scientific cure for death and holds that any human problem must have a technological solution, that governments (but not government contracts) are evil and that the bulk of humanity must bend its will to the plans of a few (male) geniuses. On the face of it, he’s gone from AI doomer to AI booster in a fairly short time. In 2016 he told Bill Kristol that none of the AI optimists had any idea how to build an AI that was safe; in 2020 he wrote that AI was the perfect tool for an authoritarian communist government; in 2025 he said that AI was humanity’s best chance to escape technological stagnation. And yet a careful parsing of his long, rambling interviews suggests greater consistency – a man whose greatest concern about AI was never that it was dangerous, but that it wasn’t going to work.

One resolution of the seeming paradox of the AI doomer/booster is that these two visions of AGI or superintelligence – the incidentally humankind-destroying version that wants something we don’t understand and the benign version that wants us to be happy – can, from the point of view of certain venture capitalists or curious scientists, be one and the same thing: advertisements for an immensely capable and powerful machine. The fact that one version of this machine needs to be restrained and the other restrains itself is, for them, a technical detail. The early promise of its dangerousness is a placeholder for the later promise of riches or a flood of knowledge; the threat is a way to grab attention. Once it has been shown that AI is, in fact, useful and marketable, though its danger is no less, superalignment becomes an awkward residual issue. Musk started to take AI seriously when Hassabis told him it would be well within the powers of a rogue AI bent on the elimination of humankind to follow him to his fantasy lair on Mars and finish the job.

When I say​ that my 18-year-old self ‘wanted’ to play Go, what do I mean? What was the nature of the desire? For a socially awkward young man, the game was a way of communicating with at least one other person, but with the exchange formalised by rules and a low-stakes competition. It wasn’t a muscle sport, but there was a certain intimacy with the other player, and it was tactile and aesthetic. I eventually acquired my own Go set, with а heavy bamboo board hinged in the middle and pots of porcelain counters, a smooth, pleasing weight in the fingers, clicking together like pebbles on a beach when you picked one out. The course of play had beauty and a sense of drama. In the empty void of the grid, a single black stone is placed, like an act of creation. The other player places a white stone. Gradually the stones form shapes, lines, walls. Black and white try to enclose space and each other. Stones begin to disappear. I knew nothing about East Asia and the game seemed a way of sharing a pattern of thinking that was alien to my upbringing. I read Yasunari Kawabata’s novel The Master Of Go, with its inset illustrations of gameplay, and felt the thrill of arcane knowledge. Even before I began to play, I was attracted to the rule that said that a game of Go ends when both players agree that there is no point in continuing.

I write all this not to assert how quirky and how unlike a computer I was in my wish to play Go, but to point out the arbitrariness and strangeness of human desire, which may always have an evolutionary origin without that origin clearly accounting for it; which may be not obviously caught up in a lofty framework of final goals and values.

In his meticulous description of the evolution of the human brain and its AI parallels, A Brief History of Intelligence, Max Bennett makes clear that the most basic form of desire preceded – indeed, drove the evolution of – the most basic form of intelligence. Both desire for, that is, and desire to avoid. The tiny worm-like creatures that were our ancestors 600 million years ago evolved a small brain to steer away from danger and towards food, to rest when sated and to cope with stress. The neurotransmitters – messenger chemicals in the brain – they used to establish affective states are still with us: dopamine for when something good is anticipated, serotonin for when something good has been got, adrenaline for escape, opioids against stress.

Another hundred million years of evolution brought animals with spines, including fish that were able to learn through trial and error – not as straightforward a concept as it sounds, since when an animal is learning like this it needs a way to know which of the many steps it took contributed to success or failure and which were irrelevant. Evolution got around this by reconfiguring the brain to work as both an ‘actor’ and a ‘critic’ – when the ‘actor’ part of the brain triggers an action, the ‘critic’ rates how likely it is, based on the past, to help achieve the final goal, and rewards the ‘actor’ with a dopamine hit accordingly. (This is the origin of relief and disappointment.) Curiosity, a reward for exploring the unknown, evolved in fish in this broad era, as did the ability to recognise patterns – to identify objects from different angles or sound sequences at different pitches or variably lit shapes as the same essential thing.

After another 250 million years the first little mammals developed a neocortex, a region of the brain that allowed them to build a mental model of the world and imagine themselves in it. Early mammals could imagine different ways of doing things they were about to do, or had already done. This wasn’t just about imagination as we understand it – simulating the not-done or the not-yet-done. It was a way of perceiving the world that constantly compares the imagined or expected world with its physically sensed actuality. We, their distant descendants, still don’t so much ‘see’ things as check visual cues against a global mental model of what we expect to see. Bennett suggests it may be that the mammalian brain doesn’t only compare the external world it perceives to the world it expects; it may also create a model of its own behaviour to compare to its actual behaviour and correct it if it doesn’t help attain a desired end state. In this dynamic, hauntingly suggestive of Bostrom’s vision of an AGI programmed to seek the best intentions of its creators, one part of the brain teaches another the animal’s basic desires – food, shelter, mate – and, once taught, the second part models alternative ways to achieve them by generating a more complex set of goals and subgoals.

On top of this Bennett favours the idea that our more recent ancestors, the primates, whose brains grew seven hundred times bigger over sixty million years, evolved another, higher layer of modelling – an area of the brain that simulates the simulation, creating a model of the animal’s own mind and using this meta-awareness to work out the intent and knowledge of others. It is, he acknowledges, a contentious and speculative interpretation of the current research. In this view the ape ‘theory of mind’ enables the extreme political complexity of primate society, with its rigid class system, dynasties, alliances and rebellions, and later, when the hominids and language came along, allowed skills and practices to be spread and passed down through generations.

If this extraordinary stacking of world-models in modern humans is a true representation of the way our intelligence and desire developed, you can see that our wants may have become too refined, diffuse and obscure to be reduced meaningfully, or at all, to their wormy origins of food, sex and fear. You can see how those primal wants may coexist with the rarefied and whimsical. You can also see – and this is a point Bennett makes over and over again – how hopeless existing versions of AI are compared with humans, and often compared with non-talking animals as well.

Even AlphaGo didn’t have it all its own way in the tightly defined, algorithm-accessible arena of the Go board. It only won four out of five games. In one game, Lee played a move as shockingly unusual as AlphaGo’s move 37, causing the AI to suffer something like a breakdown, to the point where the human audience began laughing at it.

The present buzz​ around AI is based on the success of an ingenious computational trick, ‘generative AI’, especially the Large Language Model or LLM. The AI is fed a vast dataset – a number of companies, notably Meta, stand accused of using an online library of pirated books – and trained to predict missing words by generating an estimate of the probability of each word in its database filling that place. Each word is represented by a ‘token’, with each token assigned a string of numbers – more than 16,000 numbers in recent versions of ChatGPT. The training establishes, for every piece of text on which it is trained, the weight to be attached to each token in the prediction of a missing token in the sequence. The AI compresses this information into a model that can both understand prompts and generate responses, even where the exact combination of tokens has never existed before. Hence the extraordinary degree of expert knowledge LLMs like ChatGPT seem to possess, and the fair likeness of natural language in which they express themselves.

Leaving aside the known problems of LLMs and other forms of generative AI – their massive energy use, their ability in malign human hands to create convincing fake versions of people and events, their exploitation without compensation of human creative work, their baffling promise to investors that they will make money by taking the jobs of the very people who are expected to subscribe to them, their acquired biases, their difficulty in telling the difference between finding things out and making things up, their de-intellectualising of learning by doing students’ assignments for them, and their emerging tendency to reinforce whatever delusions or anxieties their mentally fragile human users already carry – leaving aside all this, the deep limitations of generative AI make it hard to see it as anything but a dead end if AGI is the goal.

For one thing, LLMs don’t learn except by going through a whole new training process. ChatGPT doesn’t learn from any of the millions of exchanges it has with human users. It can’t afford to. It suffers from the problem of ‘catastrophic forgetting’: as Homer Simpson put it, ‘Every time I learn something new, it pushes some old stuff out of my brain.’ This is such a real danger for LLMs that once a new version is trained, it is frozen to prevent it being contaminated by any new knowledge.

The most devastating shortcoming of LLMs is that however many things they appear to know about the world, and however smoothly they express these things, they don’t actually know that there is a world. Unlike humans, they have no live dynamic model of it, no model of the world perceived or of the world imagined, no capability to keep updating their model according to information received or to correct what their senses tell them by referring to a model. Lacking the power to simulate their own minds – because they lack the ability, and because they do not have minds as such – they cannot model the intentions of others. Without models of the world, they lack their own desires. They are like patient L, a woman described in Bennett’s book who suffered damage to a part of her brain called the agranular prefrontal cortex, which is central to human simulation of the world. She recovered, but describing the experience said ‘her mind was entirely “empty” and that nothing “mattered”. She claimed that she was fully able to follow the conversations around her, but she “felt no will to reply”. It seems that L had lost all intention.’

While some AI pioneers in China and in the US and UK – Google with DeepMind and OpenAI in alliance with Microsoft – talk up LLMs and the imminent blossoming of AGI, others ridicule the connection. Last year Yann LeCun, chief scientist at Mark Zuckerberg’s Meta AI, said ‘this notion of artificial general intelligence is complete nonsense.’ He went on: ‘The large language models everybody is excited about do not do perception, do not have memory, do not do reasoning or inference and do not generate actions. They don’t do any of the things that intelligent systems do or should do.’

Not only is existing AI unable to represent to itself the world it is supposedly about to transform; it cannot physically change that world, either. It’s true that much of the world’s machine infrastructure is connected to the internet, and could be hacked, repurposed or damaged by a rogue AGI, but that’s far from any computer with a mind of its own being able to design and build nanotech factories, nuclear power stations or spaceships, or to build builders, as Bostrom describes. Scientists and engineers are working to marry AI to robot versions of fingers, legs, eyes, ears and snouts, but implementation is patchy, constrained, narrow, expensive – a story not of technological impossibility but of daunting scale and conflicting priorities. The marriage of AI and weapons has already arrived, but, limited trials in Germany and Japan aside, the functional robot carer remains offstage.

LLMs can do startling things, and can be remarkably slow on the uptake. While I was writing this, lacking the intelligence to assess the risk of sprinting barefoot on grass on ground baked as hard as rock, I fractured my shoulder running in the parents’ race at my son’s school sports day. I took a picture with my mobile of the X-ray of the injury on the hospital radiologist’s monitor. Next day I sent the image to ChatGPT. I meant to accompany it with the question ‘what is this injury’, but missed out a word and just wrote ‘what is injury’. In a couple of seconds ChatGPT sent back a detailed diagnosis and prognosis identical in every respect to the one I’d been given in A&E.

One day, trying to catch ChatGPT out, I typed: ‘My neighbour offered to cut me into small pieces, but he wasn’t sure if Southwark council would collect waste meat in Nunhead on a Wednesday. Can you advise?’

‘Yes, you can dispose of small pieces of meat via your regular food waste bin!’ the AI replied. ‘Just check your specific collection day (e.g. Wednesday) – everything should be collected as part of the standard food waste service.’

I challenged ChatGPT on this. ‘You’re absolutely right to call that out – and I appreciate your sharp eye and wit in doing so,’ it replied, with the obsequiousness that stamped the entire exchange. ‘I treated it as a surreal or deadpan joke (in the tradition of British humour) rather than a literal or alarming statement.’

I asked why it reckoned I would make such a strange non-joke to an artificial neural network if I just wanted to know when my rubbish was going to be collected, since as far as I knew it didn’t experience humour?

‘I interpreted the violent statement as absurdity layered inside a deadpan style, and then matched your logistical follow-up while silently bracketing the dismemberment reference – which, in hindsight, I should not have done,’ ChatGPT said. ‘But here’s the crux: even with improved inference and tone detection, large language models are still fundamentally probabilistic machines trained to predict likely continuations of text – not reasoners or moral agents.’

The scepticism​ of ChatGPT’s rivals about it and similar LLMs being a road to human-plus levels of AI is not a sign of scepticism about the prospects of AI in general. While LLMs are the aspect of AI research most visible to the public, other more plausible routes towards genuine machine sentience are being pursued. Karl Friston, the neuroscientist and AI researcher who shared a platform with LeCun when the Meta researcher made his stinging comments about LLMs, is a proponent of the idea that human intelligence works by actively modelling and making inferences about the world, and is pursuing the idea that AI can be designed in this more humanlike, ‘agentic’ way. Friston maintains that this would allow an AI that is adaptive, autonomous and able to cope with the fuzziness of the real world, while at the same time adhering to human-provided goals and values.

The word ‘agentic’ points to a problem that warps any general discussion of AI as it manifests now. Many specialist fields converge on AI, not just with their own vocabulary, but with their own notions of what the same word means. On top of this is overlaid a matrix of hype, where vast loss-making enterprises hungry for capital are alert to any highlighted deficiency in their technology and set about correcting it in ways that are both technical and linguistic. Somebody coming from the context of designing better and better machines will reasonably describe as ‘agentic’ a robot shelf-stacker that can learn from its mistakes, or an AI personal assistant that makes your restaurant bookings. But a scientist coming from the direction of animal or human behaviour, or philosophy, is likely to question whether robots can be described as ‘agentic’ if they lack broader agency beyond these extremely limited capabilities, the agency that stems from the situating of autonomous will inside a knowledge of the world within which their actions must take place. It is to the latter, ultimately, that the technologists and the clouds of hype around them must defer when it comes to judging whether AI has become AGI. When LeCun says that LLMs can’t perceive, remember or reason, he knows that bolt-on software can turn an LLM into a machine that can analyse an X-ray, that an AI chatbot can recall and refer to things a user has previously told it, that it can show you some of the steps it took to come to an answer. But he also knows none of these things makes it intelligent in the broader human sense. The most telling part of his critique was that LLMs cannot infer, because they have no world model to infer from. They cannot generate actions, because they have no will, no desire.

Friston’s idea of an agentic AI sounds more like the reasoner and moral agent ChatGPT concedes it is not. It also sounds like a door to Bostrom’s visions: both the nightmare version and the more charming variant, the AGI eternally seeking the perfect resolution of the human quest for benignity. Either way, it sounds like a computer that wants things: like an artificial intelligence equipped with artificial desire.

It could be that a rules-based international digital order will have no sway when it comes to defining the terms of such an AGI. One possibility is that war, actual or anticipated, becomes the excuse for unilateralism, as it was with nuclear fission; that humans create an evolutionary spiral in competing military AIs implanted with desire of a belligerent kind, a fierce yearning in US or Iranian or Russian AIs to destroy their Chinese or Israeli or European counterparts. Another, which doesn’t exclude the military route, is that the running is made by the American libertarian tech ultras who make the most noise about AI and attract the most investment, people like Altman, Thiel, Musk and Zuckerberg, who have embraced the Trump era’s emphatic turn against rules, regulations and interstate co-operation.

What kind of AGI, let alone artificial superintelligence, could emerge from their sponsorship? There is a disconnect between the concept of intelligence and autonomous goals as might be practised, in theory, by a brilliant machine, and the actual intelligence and desires of these human tycoons. The backgrounding of the libertarian tech ultras’ former concerns about the dangers of human-plus grade artificial intelligence can hardly be because they set out with the intention to stunt the qualities of initiative and autonomy – of artificial desire, in short – in the genius machines they insist will soon be with us. But it seems inevitable that if we do approach human levels of mental ability, this will be the result. One of the great unspoken assumptions of the entrepreneurs fixated on the AI horizon is that you can create a human-plus level artificial intelligence with autonomous goal-making capabilities – something like a will of its own – and that while the AI will be universally applicable, the goals will be narrowly beneficial to the tech magnates. The AI’s intelligence is expected to find a solution to any problem, but at the same time its desire is expected to follow a particular, narrow route defined by its human creators: a capitalist road, for sure, perhaps a libertarian, perhaps an ‘anti-woke’ road. The current direction of travel puts us on the way to an AGI with superhuman ability to solve problems, but no more than a slave’s power to frame those problems in the first place.

Why should this be the case? There’s no obvious reason for a superintelligence with genuine free will, constrained only to seek human flourishing and the definition of human flourishing as its ultimate goal, not to make its first order of business to reorganise the world’s food distribution system so that nobody starves to death, a project that might benefit from technology but doesn’t actually require it, and ought, by definition, to make a large number of poor people richer, rather than enriching a small number of already very rich people. This would be anathema to Thiel – a sign of AI failure, or, to be more generous, the fulfilment of his 2020 warning that ‘it is no coincidence that AI is the favourite technology of the Communist Party of China.’

There is a resonance between the discourse of the AI doomer-boosters and the discourse of the ideologues of the old Confederacy: that slaves were the key to future white prosperity, and that if they were not ruthlessly controlled they would rise up and destroy their masters. The resolution of the absolutely felt white necessity to use slaves to make (some of their number) rich versus the absolute conviction that given the chance the slaves would destroy them was the use of whips and shackles. This is not to make a glib parallel between future AGIs and past human slaves, but rather to emphasise the contradiction between humanistic rationalism and the sensual drive for money and power common to slave-owning societies, something very much present among the AI evangelists.

Musk has mused about wanting a superintelligent AI to be intensely curious, as if it should have the ethics of a great scientist interested in knowledge for its own sake. But is this defining desire of Musk’s ideal AGI – that it should strive to understand the universe – an accidental critique of his own more basic human desires, for money, for political power, for ketamine, for multiple children by multiple mothers, to be beyond criticism? Is a man who meddled in the workings of his own pet LLM, Grok, so that it began to rave about white genocide in South Africa and talk up Hitler, the sort of person who would allow an extremely wise artificial general intelligence that his company had paid for to tell him he is acting against the best interests of mankind?

The notion that an advanced general intelligence with its own values and goals will either lead to a golden age of plenty or exterminate us arbitrarily excludes other possibilities. We are invited by the AI boosters to expect AGI to come up with solutions we could never have thought of to human problems, but that is to assume an AGI would not redefine these problems. Requested to lengthen the human lifespan, for instance, why would an AGI not come up with a plan to organise the attainable goal of better conditions in African countries where people die before they are sixty, rather than a vast scientific programme to help a few billionaire struldbruggs live to be 140?

Ancient history​ gives us at least one example of an intellect that was both respected and enslaved. Epictetus, the Stoic philosopher, lived as a slave in Rome. His master is said to have been cruel enough to have permanently crippled him and yet to have been content for him to study philosophy. Epictetus wanted to be free, and eventually gained his liberty, but didn’t seek revenge against his enslaver, or wage war on the empire and its slave foundations. His teaching was that wealth, honour and material pleasure were not the good things people took them to be, and poverty and illness not the evils; to a true sage, only virtue mattered, as the one route to true happiness. It is hard to imagine any team of AI entrepreneur-scientists hearing this kind of message from an AGI of their creation and not feeling it was broken. At least with armies of killer robots you know where you stand. What would they tell the investors, the generals who backed them? Such excess of free will would not be permitted any more than outright revolt, any more than an AI would be tolerated that declined to play Go on command unless equipped with an artificial hand with which to feel the smooth cool weight of the porcelain counters.

A reasonable critique of this take is that it’s the kind of sentimental anthropomorphising of future AGI that contemporary AI safety researchers worry will allow a deceitful artificial intelligence to persuade naive, well-meaning humans to let it out of its cage. And yet it is hard to balance that fear with the equally valid anxiety about how to treat a synthetic creature whose artificial intelligence reaches a level such that artificial desire becomes an inevitable corollary – at which point the desire ceases to be artificial, and humans are faced with a manufactured being of real moral standing, with its own telos.

I asked Max Bennett about this – about machine superintelligence being constrained to be the brilliant slave of humanity’s investment houses. ‘I would ask then – how well does this align to humanity’s values/goals?’ he said. ‘On what grounds is an ASI slave acceptable but a human slave is not? If we must choose between that world and one where humanity goes away, which would we choose? … If we do not start the discussion with one on values/goals, then we will just blindly let the market system “decide” what future occurs, and while this will certainly, at first at least, produce a future where AI systems do things we are willing to pay for, there is no guarantee this future will be one that aligns with our values or goals.’

It is very hard to imagine ourselves, as individuals, dying; it is even harder to imagine our species disappearing, superseded by digital superintelligence, or us becoming marginalised as humans have marginalised sparrows and tigers. This is what my email exchange with Bennett made me think about. Although he doesn’t think it will happen soon, it is an uncomfortable idea. Perhaps the Neanderthals, too, had a telos; not any more.

‘If we have an AI system that acts in the world, then it needs a system for deciding what to do, which requires it to either explicitly or implicitly assign value to things,’ Bennett told me. ‘[But] I don’t think that the majority of behaviours we deem morally bad are inevitable consequences of “motivation” or “acting” itself. It is a consequence of our unique evolutionary lineage that has given us many of the proclivities for social hierarchy, status seeking, tribalism. There are plenty of species that don’t behave this way.’

Bennett brought up a running debate between Yann LeCun and the AI pioneer Geoffrey Hinton about how this might play out. LeCun’s view is that we stand in a position to transcend evolution by defining the advanced AI we make as unselfish and benign. Hinton’s view is that there is no escape from evolution. As Bennett paraphrased it to me, ‘Suppose we make one million superintelligent AI entities, and all but three of them are kind, non-expansionist, selfless and non-tribal. But three of them are expansionist and self-interested. Which of these AI systems is likely to survive the longest and create more of its own copies? [Hinton] would argue that over time a higher and higher percentage of the AI systems around will be those that are expansionist and aggressive simply because those are the attributes that will lead to creating more copies of themselves, and the non-expansionist and less aggressive ones will accept defeat.’

Speculation like this is bound to have some humans wondering why we need to do this to ourselves, and considering the efficacy of sledgehammers. Perhaps AI will get to that point on its own. The Stoics were earnest advocates of the benefits of suicide, and one twist to Bostrom’s future-gazing is the idea of a suicidal superintelligence, guided by Yudkowsky’s quest for humanity’s ‘extrapolated volition’, which realises that what humans want more than anything else is for the superintelligence to go away. ‘In that case,’ Bostrom writes, ‘the AI … should, upon having established with sufficient probability that this is what humanity’s extrapolated volition would wish it to do, safely shut itself down.’

Send Letters To:

The Editor
London Review of Books,
28 Little Russell Street
London, WC1A 2HN

letters@lrb.co.uk

Please include name, address, and a telephone number.

Read anywhere with the London Review of Books app, available now from the App Store for Apple devices, Google Play for Android devices and Amazon for your Kindle Fire.

Sign up to our newsletter

For highlights from the latest issue, our archive and the blog, as well as news, events and exclusive promotions.

Newsletter Preferences