Alignment Problems

Paul Taylor

The OpenAI website has a page explaining the company’s peculiar structure. It was founded as a non-profit by a couple of tech companies and a bunch of entrepreneurs including Peter Thiel and Elon Musk. The aim was to secure $1 billion in donations to develop artificial general intelligence (AGI) for the benefit of humanity. After securing only around $130 million – nowhere near enough for such a resource-intensive endeavour – the team created a financial vehicle that would allow OpenAI to attract investment capital while remaining more or less true to its founding mission. The for-profit subsidiary, OpenAI Global, would be legally bound to pursue OpenAI’s goals, and

would have caps that limit the maximum financial returns to investors and employees to incentivise them to research, develop and deploy AGI in a way that balances commerciality with safety and sustainability, rather than focusing on pure profit-maximisation.

The company’s legal structure is supposed to ensure that, should it succeed in developing AGI – defined as an autonomous system that outperforms humans at most economically valuable work – the intellectual property will belong to the non-profit for the benefit of humanity. But investors will get a share of any profits generated by innovations created along the way that fall short of AGI. The web page includes a highlighted message:

It would be wise to view any investment … in the spirit of a donation, with the understanding that it may be difficult to know what role money will play in a post-AGI world.

It now seems clear that Microsoft, the most prominent of OpenAI’s backers, does not view its $13 billion investment in this spirit but is keen to make the most of its position in a world where the role of money is well understood. When, last Friday, the OpenAI board sacked the company’s CEO, Sam Altman, accusing him of a lack of candour in his communications with them, Microsoft put pressure on the board to reverse their decision. The board then spent the weekend in negotiation with Altman, but failed to lure him back. On Monday it was announced that Altman, along with a number of OpenAI colleagues, would be joining Microsoft and launching a new AI initiative there.

Commentators have characterised the dispute as a conflict between two tribes at OpenAI. One, around Altman, is excited by the potential for new products and profits; the other, associated with another co-founder, Ilya Sutskever, and his allies on the board, is more focussed on the original mission.

Since the astonishingly successful launch of ChatGPT a year ago, Altman had become the public face of the company and was keen to exploit its leadership in generative AI in as many ways as possible. He was said to be trying to secure a multibillion investment for the development of new specialised hardware, to be talking to Jonathan Ive (formerly of Apple) about an OpenAI equivalent of the iPhone, and to be the driving force behind the recent announcement of the GPT Store. There was talk of allowing employees to cash in their shares.

Altman dropped out of Stanford in 2005 to launch a social networking company. Sutskever graduated that year from Toronto and went on to study for a PhD under Geoffrey Hinton. He helped develop AlexNet, a transformative approach to image analysis. A start-up company spun out of the research was later acquired by Google. Sutskever, like Hinton, seems both fascinated with the potential of AI and increasingly afraid of its implications. Since AI programs, however intelligent they may be, are still only programs, we ought to be able to rely on them to do as they are told. The difficulty is being sure that we have in fact told them to do what we want them to do – otherwise known as the alignment problem. In July, OpenAI announced that Sutskever’s primary focus as head of research was now ‘superalignment’: over the next four years, 20 per cent of the computing resources they had already secured would be devoted to building a human-level automated alignment researcher.

Sutskever believes that OpenAI is on track to create entities more intelligent than people, and in so doing render almost all forms of work redundant and money superfluous. He is concerned that, in doing this, we may accidentally build software agents beyond our control whose goals are imperfectly aligned with ours, which potentially carries a significant risk of extinction. The proposed solution is to create a form of artificial superintelligence that will be able to spot this happening and intervene to prevent it. What could possibly go wrong?

It should be easy to pick a side here: the conflict is between a CEO driving innovation to make as much money as possible and the board of a non-profit striving to be responsible and give due consideration to the unintended consequences of its technology. Except that the stated goals of the non-profit – the intended consequences of the technology – are mind-bogglingly disruptive, and its approach to thinking responsibly has not been to slow down or to involve other stakeholders in a system of checks and balances, but to invest in a speculative high-risk research project.

It may be that OpenAI’s progress towards AGI is significantly slowed by Altman’s departure. Seven hundred of the company’s 770 employees, some of whom may be more excited by the possibility of getting their hands on a share of OpenAI’s $80 billion valuation than by the prospect of a world without any need for money, have signed a letter threatening to join Altman at Microsoft if he isn’t reinstated at OpenAI. Astonishingly, the letter is signed by Sutskever, who seems to have experienced a temporary difficulty in aligning his goals and his actions.


  • 22 November 2023 at 12:11pm
    Delaide says:
    It seems fanciful to me (it is, isn’t it?) that software agents could threaten us with extinction but it’s easy to imagine lesser risks that could adversely impact on society. Must we leave management of these risks to wannabe and actual billionaires?

  • 22 November 2023 at 8:01pm
    adamppatch says:
    It is interesting to wonder who is this "we" that may accidentally build software agents beyond our control whose goals are imperfectly aligned with ours, potentially carrying a significant risk of extinction.

    This seems to assume a nice, responsible we, and that provided this we can keep AI under control and align its goals with our own, we'll be just fine. But I'm not sure the problem is really technological.

    Surely, there is a greater risk that we may deliberately build software agents under our control whose goals are perfectly aligned with ours, and which carry a significant risk of extinction.

  • 23 November 2023 at 5:19pm
    Keith Davidson says:
    Maybe a good idea to work out what our goals are, as a species, before trying to align AI with them.