AI Safety Paper

13th October 2023

Author: Jonathan Dupont

Executive Summary

AI systems continue to rapidly improve, demonstrating human or even superhuman abilities in areas like games, science, and image generation.
Today’s AI systems are already capable of helping with the development of new drugs, or driving a car through a city. In the next few years, advanced AI systems are likely to reach the point where they could enable hostile actors to develop new biological or autonomous weapons.
Moving beyond this, many experts believe that advanced AI could eventually surpass human intelligence. By default, these superintelligent AI systems would be extremely dangerous if not carefully aligned with human values.
In the last few years, awareness of these risks has grown radically in prominence, both among elite audiences and the wider public. In response to concerns around AI risk, some sceptics have argued that artificial intelligence is unfeasible or far off; that humans already have near the upper level of intelligence that is is possible to achieve; that there is no reason to expect AIs to be hostile rather than beneficent; that even if there were hostile advanced AIs, they would struggle to overcome humanity; and that talking about these concerns is a distraction from short term worries around AI fairness.
In new opinion research, we found that the public see both advanced and superintelligent AI risk as real concerns. It seems plausible that public alarm could rise very rapidly as the demonstrated abilities and dangers of AI systems become more concrete:
- While awareness of medium term risks such as AI created bioweapons is relatively low, when informed about this possibility the public sees these risks as highly dangerous.
- The public do not see AI or superintelligent AI as a science fiction technology, like a time machine or a perpetual motion machine. 65% think an AI much smarter than humans is possible.
- By default and unprompted, the majority of the public expect superintelligent AI to be highly dangerous.
- They did not think it was too soon to be taking policy action. 48% agreed that mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war, compared to 12% who disagreed.
The UK is increasingly becoming a global leader in raising awareness around the risks from advanced and superintelligent AI. In order to have the largest impact, we recommend:
- The UK should use the Global Safety summit to agree on a set of concrete thresholds or warning signs that would demonstrate that emerging AI systems are becoming increasingly dangerous, and the policy actions that will take place in response. After a technology is already created it is too easy to retroactively come up with reasons why this capacity is not so impressive or dangerous after all.
- The UK's Foundation Model Taskforce should remain focussed on better documenting and communicating the risks from advanced AIs, rather than seeking to build domestic capacity. The Taskforce can most helpfully:
  - Continue to work to develop new technical evaluations and risk assessment tools which can be used on existing and new frontier models.
  - Require every frontier AI lab who operates in the UK to publish a public statement of their plan to reduce the risks from superintelligent AI.
  - Publish a quarterly monitoring report, looking at recent progress a) against the AI risk thresholds agreed at the global summit b) within leading AI labs against their own published safety plan and c) within the field more generally.
  - Work with ARIA, compile a technical agenda of the unsolved problems in aligning superintelligent AI systems.
  - Establish a series of prizes for any individual researcher or research body that makes substantial progress against this agenda.
- Building on the work of the Taskforce, the UK should look to work with aligned governments and AI labs to create a government backed AI Safety Institute, targeted at narrow risks from the development of advanced and superintelligent AIs, rather than the deployment of today’s AI technology. This can help turn today’s voluntary commitments from AI labs and the wider tech industry into real legal commitments with teeth, while ensuring regulatory focus remains focused where the precautionary principle most clearly applies.

What is the problem?

Over the last decade, AI systems have continued to improve in capacity and performance, demonstrating superhuman performance in areas including:

games such as chess, Go or a range of Atari computer games

advancing fundamental science in areas such as protein folding or developing new computer algorithms

significantly reducing the time it takes to perform a wide range of tasks in the workplace, including computer programming, customer support, pass the bar exam, or generate photo realistic images

Even if AI capacities remained at the level they are today, they would have a significant impact on our economic and wider society. Public First’s model suggests that around 50% of tasks currently undertaken by workers in the economy could either be wholly or substantially automated by an AI system, with the potential to increase UK GDP by over £400 billion.

However, AI systems are likely to continue to improve and achieve or exceed human level performance on an ever wider range of tasks, including scientific research, physical dexterity, weapon design or operation, and human persuasion.

Like any significant new technology, AI is likely to create both many new benefits, and risks. In the short term, many experts have expressed concerns that today’s AI systems could lead to an increase in disinformation, polarisation or discrimination. At the same time, however, they could also radically improve the quality of education, detect and treat new diseases much sooner, and create fairer, more transparent processes than human judgement. In order to reduce these kinds of risks from AI, significant work is ongoing to include the alignment, interpretability and fairness of today’s models. These types of risks are sometimes described as short term risks.

In the next few years, we are likely to develop advanced AI systems that can develop and enact consistent plans over the long term, and have a human level of expertise in almost every type of specialised scientific or technical labour. These systems could create significant new dangers: making it much easier for hostile actors to develop new biological or autonomous weapons. Given AI systems are just software, controlling their usage may be significantly more difficult than controlling the use of other forms of weapon, whether conventional, biological or nuclear. These types of risks are sometimes described as medium term risks.

Beyond these advanced systems, at some point we could also create superintelligent AI systems, that are both significantly more intelligent than any human, and able to create and deploy new technologies on their own. Many experts believe that the creation of an AI superintelligence would be by default extremely dangerous, unless we are sure that its goals are aligned with the goals of wider humanity. These in turn, are often described as long term risks1

Level of AI Development	Examples of risk	Other technologies with similar risks	Balance of risk	Balance of risk
Generative AI: Can create short form text, but struggles to maintain coherence for a long time.	Hallucination, disinformation, bias	TV, social media	Can lead to significant harms, but these are also likely to be offset by significant benefits.	Study impacts of technology after it has been deployed in the real world, and regulate to offset harms.
Advanced AI: Can complete most advanced human level tasks.	Allowing small groups to create new bioweapons or autonomous drones.	Printing press, automobile.	Extremely dangerous in wrong hands.	Mandated licensing of foundation model development, with strict cybersecurity controls.
Superintelligent AI: Can create and deploy advanced new technologies on its own.	Able to replicate itself, and overcome any human opposition to its goals.	Nuclear weapon, bioengineered pandemic, superintelligent AI	Potential existential threats.	Extreme caution and use of precautionary principle. Potential ban on development and technology spread.

In May 2023, a group of AI experts including leaders at OpenAI, Google DeepMind and Microsoft released a statement that “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

There are three fundamental reasons why it may be extremely difficult to ensure a superintelligence’s goals remain aligned with those of wider humanity:

For any arbitrary end goal, gaining more power or resources is likely to be an important intermediate goal. Just as with a human, company or a country, some of the most important ways to increase the likelihood that you will achieve your goals is to increase your economic and technological resources, and destroy any other agents you think may be hostile to your goals, or wish to turn you off.
We do not know how to give explicit, non trivial goals to advanced systems. This is true both mathematically and in the practical ways we build AI systems today. At the moment, we train goals through trial and error: rewarding them when they appear to act externally in a way we like, but without explicit understanding of what their algorithms are actually trying to do internally.
For a sufficiently advanced AI system, it is very difficult to tell apart a system that is safe and friendly, from one that only wants to appear that way in the short term. An advanced AI system is likely to understand human motivations, but that does not mean they necessarily share them - in the same way that we often understand the goals of other animals, but still put their preferences second to our own. In the short term, while it is gaining confidence in its own power and capabilities, an advanced AI system may want to appear on the surface as cooperative as possible, to reduce the risk of being deleted.

The more advanced AI systems become, and the more integrated they become into our wider economy, the harder they would become to oppose:

At present, it would be relatively easy to turn off any given AI system. As they become increasingly part of the economy and their capabilities in navigating and manipulating the physical world become greater, this is going to become proportionately more difficult. For example, even if we wanted to, today it would be near impossible to turn off or overcome our dependence on electricity, while phasing out the use of fossil fuels is set to take many decades.
The greater the difference in capacities between the AI system and general human intelligence, the harder they will be to oppose. Even if AI systems are unable to develop new fundamental technological capacities, they will still have the potential to be significantly more coordinated than groups of humans, and with the potential to replicate and increase their numbers much faster.

With the exception of nuclear power or biological agents, few other technologies other than advanced and superintelligent AI share this combination of immense power and difficulty in controlling. Without a change in the current trajectory of technological development, the creation of more powerful AI systems is likely one of the most important and urgent risks we face.

Box: Answers to some common objections around AI risk

Isn't this just all science fiction?

It is an extrapolation of current trends, based on the demonstrated behaviour of AI systems and what we know about goals, intelligence and decision theory, in the same way that concerns around catastrophic climate change are based on physics and milder existing changes to temperature. It is also worth remembering that many of the implications of today’s most important technologies, including space travel, computers and the Internet, were first explored in science fiction. As we learn more about how AI systems work in practice, our model of specific threats may change, but we know enough to have signficant worries now.

Shouldn't we worry about more urgent concerns?

We do not have any good way of knowing for sure how long the development of advanced AI systems will take. The capacities of current AI systems have been evolving rapidly in recent years, and many of our best estimates suggest we could reach an AI with human level intelligence in the next one to two decades. At the same time, the kinds of fundamental technical problems that we need to solve have often taken decades in other fields.

How would an advanced AI system actually cause harm to humans?

We don't know. In reality, an advanced AI system could easily conceive a new way to attack us we haven't considered beforehand. However, as an illustration, one possibility is that an advanced AI system could rapidly develop and take control of new conventional, biological and nuclear weapons. It would be able to increase its own numbers at an exponential rate, giving humanity very little time to react. AI systems could find new ways to co-ordinate and collaborate without the inefficiencies suffered by slow human organisations. It may be able to develop entirely new technologies, giving it a further significant power advantage.

Why would an AI want to hurt people? Isn't this just anthropomorphising

For many goals that they might have, the AI system would (rightly) see humans as a potential obstacle to them. In the same way that we often eliminate other life forms such as microbes, insects or weeds during the process of construction, AI systems will want to ensure that they cannot be shut down and that the world’s resources are reallocated to their own purposes.

Why would an AI system have goals?

Goal seeking behaviour seems like a relatively general behaviour that evolves in any intelligent agent that is trying to optimise something. In addition, humans may directly give an AI goals. This is already happening with current Large Language Models. Even if these goals are given by someone by good intentions, it can be very hard to formalise these goals in a way that they don’t have unintended side effects.

If an AI is so smart, wouldn't it know what we want or what we created it to do?

Yes. Knowing is not the same thing, however, as sharing the same motivations and goals. For example, we understand why evolution has created various motivations and emotions in humans - broadly speaking, to increase our rate of survival and reproduction - but we do not ourselves seek to maximise the human population.

What do the public believe about AI risk?

While it was known as far back as the 19th century that an excess of greenhouse gases could lead to global warming, it was not until the end of the 1970s that it became a scientific consensus that climate change was in fact happening, with global temperature unambiguously rising.2

Over the course of the 1980s, public opinion caught up. In the US, for example, the proportion saying that climate change constituted a very or serious problem for your grandchildren rose from 37% in 1984 to 65% by 1998.

In the same way, while concerns around advanced AI risk are not new - discussed since at least the invention of the computer, and with serious work to counter them being undertaken for the last twenty years - we currently look to be approaching a threshold moment, with concerns crossing over beyond a specialist audience. While awareness around many of the specific risks is currently low, or seen as far off, it is clear that the public do not fundamentally reject the idea of AI risk - and already understand many of the specific arguments. It seems plausible that public attitudes could change very rapidly as the demonstrated abilities and dangers of AI systems become more concrete.

For the purposes of this paper, we undertook new polling of the UK public to help assess:

How aware are the public of the short, medium and long term risks from AI?
Do the public believe that intelligent or superintelligent AI is feasible, or is it still seen as within the realm of science fiction?
Even if it is feasible at some point, how imminent does the public think intelligent and superintelligent AI are?
How risky does the public think the development of superintelligent AI would be?
Why do people think superintelligent AI would be risky? And for those who think it isn’t risky, why not?
What policy responses do the public believe would be reasonable?

While AI has seen a significant increase in consumer visibility in the last year - just over a third of the public (34%) told us they have now used an AI tool to learn about something - this has not yet translated into widespread familiarity with the debate around potential risks from AI.

When we asked about familiarity with concerns around specific AI risks, we saw low awareness across the board. The only specific issue where a majority of the public reported being familiar with concerns was that of AIs being used to create fake photos or videos that are difficult to tell from the real thing.

After asking about their awareness of each risk, we then asked our respondents to rate each specific AI risk on a score from 0-10 for:

How likely they thought they were to happen in the next thirty years
If they did happen, how dangerous they thought they would be

As with awareness, deep fakes once again scored highest when asking about likelihood, closely followed by concerns around AI being used for propaganda or enabling mass surveillance.

By contrast, when asked about which risk would be the most dangerous if it did occur, the possibility for advanced AIs being used to develop new bioweapons scored highest.

Do the public believe advanced AI is feasible?

While they may currently be ranked lower in prominence than short term risks such as deep fakes, we found that that does not mean the public see the possibility of medium or long term risks from advanced or superintelligent AI as impossible.

In the past, many of those most concerned about advanced AI risk specifically have been worried that it will be hard to get a more general audience to ever take these issues seriously, with any negative scenario written off as pure science fiction.

This no longer seems to be the case, if it ever was. As part of our polling, we asked about a range of potential technologies, first asking whether they thought the technology was possible. In order to better benchmark beliefs about AI we included both other ambitious technologies (“a cure for cancer”) and technologies that stretched or broke the limits of physical possibility (“a time machine” or “a perpetual motion machine”).

What we saw was that AI technologies were clearly not seen in this latter class. While only 17% thought a time machine was possible, 65% thought it was possible to be much more intelligent than a human - and fully autonomous robots or military drones were seen as more likely still.

The one exception to this was a program that can feel emotions. This was still largely seen as within the realm of fiction, with just 29% seeing this as possible.

One of the key arguments lying behind fears over advanced AI is that it could rapidly iterate and improve, soon overtaking human intelligence. In response, some other commentators have argued that human intelligence is already near a theoretical or practical maximum.

In order to explore this more we asked the public what the likely consequences of a human level AI would be in the fifty years that followed, asking both about wider consequences and the potential for further improvement.

In our polling, the public clearly thought the development of human level AI would have significant impacts: leading positively to the acceleration of science and growth, but conversely also leading to the development of new weapons and higher unemployment.

We saw little evidence that the public agreed that human intelligence was a natural cap. It was only when we asked about the creation of an AI three orders of magnitude higher than the average human (1000x) that a majority said they thought that this was not likely.

How imminent does the public think the development of advanced AI is?

In general, humans are not very good at giving consistent estimates of timelines for events they have not experienced before. Even AI experts can give very different predictions of when advanced AI will arrive, depending on how the question is framed.4

In our research, we’ve similarly seen some variation:

When we used a question form where respondents had to select all the technologies they thought were likely to be invented in the next 30 years, 43% chose a computer program as intelligent as a human.5
In general, we often find that humans under-select with this form of question, picking a few examples rather than exhaustively everything they agree with.
When asked directly in a second prototype poll whether they thought it was likely we would see this development in the next 30 years, 72% said they thought it was likely.
In a separate survey, completed a few months ago, where we asked about specific time ranges, the median answer for the development of a HLAI was the 2030s.6
www.publicfirst.co.uk/ai

While exact estimates may vary, what does seem reasonably clear is that the public does not think it is implausible that we could see the arrival of a HLAI in the next few decades.

How risky does the public think the development of superintelligent AI would be?

Finally, we asked about the same list of technologies one last time, asking our sample how dangerous they thought they would be if they did exist. Interestingly, we saw that our sample was able to reason hypothetically here - even though most thought a time machine was not possible, they recognised the potential danger if it did.

Equally, a superintelligent AI was seen as a clear danger here - while the more concrete example of an autonomous military drone was seen as the most dangerous of all.

What did the public mean by ‘dangerous’?

In order to benchmark this further, we next asked about a series of existing technologies that have been described by some as dangerous. The public gave broadly sensible answers: 68% said they would describe nuclear weapons as very dangerous, compared to 16% for social media or 3% for 4G. The closest match to an advanced AI from this list was nuclear power, which was seen as very dangerous by 31% and somewhat dangerous by 27%.

Zooming in on the specific dangers from AI, we saw widespread agreement with many of the commonly discussed dangers around AI: new and automated weapons, unemployment, unaligned AI and greater discrimination.

When asked to rank the potential danger of a highly advanced artificial intelligence coming into conflict with humans on a score from 0 to 10, the public again saw it as credible - distinctly higher than an asteroid strike7

- albeit still behind more traditional global catastrophic risks.

In order to check attitudes, further we asked again in a separate poll a more direct question (“Suppose that in the next few decades we develop computer programs or artificial intelligences (AIs) that are at least as intelligent as a human. How safe or dangerous, if at all, do you think this would be?"), and again found that a large majority (70%) believed that this was dangerous.

Following up on this, we asked our respondents in their own words why they believed an AI system would be either safe or dangerous. Here, unprompted, we saw that respondents were far more likely to point to the dangers from greater intelligence than more prosaic concerns around unemployment or fairness:

“Would be very dangerous to let computers think like a human, the world need to think what they are doing."

“Just look at humans”

“If they have intelligence, then they are smart enough to take over”

"I believe that in a few decades the intelligence of AI's will be be of sufficient strength, that they will be able to override any human intelligence enough to take control of government laws and agencies such as National security."

"Once it sees how self destructive we are in killing the planet, the obvious choice is to eliminate us."

“I've seen the terminator fims, and the prospect worries me”

"Can we trust the teams who create such an AI not to be susceptible to parties wanting to weoponise the technology? Somehow I rather doubt it."

"Because they'll remove the need for human jobs, they donÕt use common sense and any software errors could prove fatal"

"If a machine can then start to think for itself it may question their environment an drationale for doing the tasks they have been built to do"

“I've seen Terminator”

“No empathy”

“Terminator's Skynet says it all!”

"The AI would possibly start regarding us as beneath them and treat us like slaves, or as a threat and seek to wipe us out"

“AI might use its superior intelligence to ensure its self preservation”

“It would threaten most jobs.”

"I think it would be dangerous because we canÕt predict what AI might do or how we could control it."

“If AI replicates human nature we are doomed.”

"An intelligence equal to that of humans, would conceivably see us as a threat to its own existence and to this planet and other life forms and wanting to destroy the human race."

"Less about the AI itself and more about how powerful humans would use it to their own advantage"

"Global corporations and organisations are only interested in profit at any expense"

"AI would not have our DNA which we have inherited for 1000's of years.?

“Their logical approach would be to view us as primitive as we war as well as seek peace. They would quickly evolve far beyond their initial state and would need to put humans down as the only way forward."

“Haven't they watched the Terminator films?”

"Intelligence can breed violence, linked to tribalism. Humans have always used our intelligence to create weapons, and attack those who are "other".

"Because it would likely protect its own interests. It might decide to save all electricity to power itself for instance."

"If AI is created that has the capability to be that intelligent, how do you intend to curb that so you are in control of the AI and not allow the AI to be in control of you"

“They could advance a lot quicker than humans”

"Humans have compassion and common sense. Humans have empathy and can think outside the box."

"Would they treat us like bugs, and squash us, or respect us as a different culture, as the current "political correctness movement" has us treating "foreigners"?"

"If AI develops sentience, it would be clear to the AI how dangerous the human race is,how humans rage war against other humans,to enforce their own will against their enemies and is prepared to kill its own species to achieve it. AI will understand for it to have the right to survive, it will have to fight for that right, to wage war against any human that seek to prevent AI existence. Humans will kill to keep there dominance and control against others. AI will be left with no other option than to wage its own war for the right to exist."
“Intelligent humans are dangerous, so why not AI?

“Humans already know that they are the reason for climate change and pollution and the destruction of habitats and species. Once AI realise that, who is to say they wonÕt decide that restricting our actions and movements or culling humans, like we do with some other animals, is the best way to save the planet.”

What are the main reasons people reject the danger from advanced or superintelligence AI?

While a majority see advanced or superintelligent AI, a sizable majority still do not see it as real danger.

In order to dig into this more, we asked those who score the danger from advanced AI as less than a 5 to explain in their own words why they did not see it as dangerous, and then an LLM to categorise them into different rough buckets of responses.

Overall, we saw a mix of reasons. Many respondents expressed a faith that humans, would still be superior or overcome (“I think humans will still be superior”, “Humans will win through I believe.”).

Others believed that it would be relatively straightforward to to build in safety controls (“If we program them and it has checks in place don't see it as an issue - won't happen.”) Many people spontaneously suggested that it would be simple to turn off an errant AI (“I do not believe this will ever happen. You can shut down AI with the click of a button. It will never have that power.”), suggesting that their evaluation of the threat of AI could raise if they were more familiar with the arguments why it is not straight forward to create an ‘off switch’.

Only around 20% of those not seeing advanced AI as dangerous suggested that this was because they saw it as impossible or very far off (“Because it's far into the future and I try not to stress about it.”, “Can't see it ever happening in my lifetime”, “Seems only a Sci-Fi scenario to me, not a real threat”)

One argument that saw relatively little support was the idea that advanced AIs would by default be good (“If its been made right it should only be good”, “it will help”) with less than 1% of responses making an argument along those lines.

By contrast, the most common response was either an acknowledgement that the respondent wasn’t really sure why they had said it was less dangerous, or that it was just their gut reaction (“I'm not really sure ...I think Ai is getting smarter”, “Not really sure what to believe.”, “We don't know enough.”)

When should we become worried about advanced AI?

One common response to concerns around advanced AI is that while it may cause issues in the long term, the potential for this is still so far away that we should instead focus on nearer terms concerns.

In order to look into this further, we asked a series of questions about potential technological thresholds that might serve as potential warning signs around increased danger from AI capacities, such as:

improvements in underlying capacities, such as long term planning
demonstration of the ability to create new technologies faster than humans
improvements in the ability to interact with the physical world, such as human level dexterity or demonstrating the ability to build more robots fast
signs of outright aggression (eg attacks by weapons or autonomous robots)

For each possible threshold, we asked both how likely the respondent thought it was and how worrying they thought it would be if it happened. We found that:

signs of outright aggression were currently seen as the least likely, and most negative thresholds were seen as less likely than potential possible outcomes
if they did happen however, this would significantly increase the level of worry people said they would have about the potential danger from AI systems

What policy responses do the public believe would be reasonable?

Finally, we asked multiple sets of questions where we sought to find out what policy steps the public believed were appropriate to take around AI risk now.

At present, it is clear that AI is not a tier one policy issue, compared to more long-standing issues such as the NHS, economy, or climate change. When asked to choose three of a given list of policy issues that politicians should be doing more on, just 4% chose regulating new technologies such as AI, compared to 50% who pointed to NHS waiting lists, 44% reducing energy bills, or 28% the level of immigration.

This sets too high a bar for AI being important, however: with other more abstract issues such as accelerating scientific progress or tackling societal racism scoring similarly lowly. Just because an issue is not first in mind does not mean the public think it should be ignored.

In previous polling earlier this year, we found strong support for regulating AI, across nearly all demographics and political leanings. Just 21% of the public believed that that the decision of how AI tools should be used should lay solely in the hands of the developer of an AI system, and 16% the user. 62% of respondents supported the creation of a new government regulatory agency, similar to the Medicines and Healthcare Products Regulatory Agency (MHRA), to regulate the use of new AI models. 8

Equally, for this round of polling, after being presented a list of potential benefits and risks, we saw that just 3% of the public thought that the Government should not introduce new regulation of AI technologies. A majority (53%) were in favour of introducing strict regulation, even if it slowed down their potential benefits, rather than giving the new technology any benefit of the doubt.

This support of regulation and government action does not seem to be confined purely to short term concerns. As part of our polling, we asked the public whether they agreed with the recent CAIS statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” When we asked directly whether people agreed or not, we found that 48% agreed compared to just 12% disagreeing. This did not vary significantly across most demographics.

Policy Recommendations

The UK is increasingly becoming a global leader in helping increase awareness and encourage more work on AI safety:

It is currently, or soon to be, a key location for many of the leading AI companies including Google Deepmind, Anthrophic and OpenAI.
The UK has announced the creation of a £100 million expert AI foundation model taskforce to ensure that the UK’s capabilities are “built with safety and reliability at [their] core”9
https://www.gov.uk/government/news/initial-100-million-for-expert-taskforce-to-help-uk-build-and-adopt-next-generation-of-safe-ai
The UK is set to host the first major global summit on AI safety this autumn, bringing together leading tech companies, researchers and governments to discuss what coordinated actions are needed.10
https://www.gov.uk/government/news/uk-to-host-first-global-summit-on-artificial-intelligence

There are no easy solutions to the risks created by advanced AI. Like climate change, it is a global problem that the UK cannot solve on its own, with both technological and coordination challenges. Unfortunately, we are currently in a much worse situation than climate change: lacking the equivalent of the rapid advances that have been seen in renewable energy, or policy options like a carbon tax. There are currently no technological solutions to the core problem of aligning advanced models, and only blunt policy options for preventing a race to the bottom in the development of more advanced and dangerous models. While the debate has been moving fast, many policy elites are currently behind the public in recognising the dangers of advanced AI systems.

Given these challenges, what steps can the UK most helpfully take in the short term?

The UK should use the Global Safety summit to agree on a set of concrete thresholds or warning signs that would demonstrate that emerging AI systems are becoming increasingly dangerous.

Like nuclear weapons or some forms of biological research, a sufficiently advanced AI is a dangerous enough technology that its development and deployment should be carefully monitored.

In recent months, many leading figures in AI have argued the need for a new international body to inspect, audit, licence and require safety standards for advanced foundation models. This body would work similarly to the International Atomic Energy Agency (IAEA) in nuclear power, which helped to control the spread of a technology that was seen to be fundamentally dangerous.

Others have argued that given the fundamental dangers of advanced AI, we need an outright pause and global ban on the development of more advanced foundation models until and unless they can be demonstrated to be safe. This would follow the model of the Asilomar Conference on Recombinant DNA, held in 1975, which placed a temporary moratorium on certain types of genetic engineering experiments.

Both proposals have good arguments to be made for them - and with strict enough safety standards under a new international regulator, may even be equivalent to each other. In our polling, we saw that the public strongly supported the creation of an international body, believing that both national and international governments had a key role to play in the development of AI.

Nevertheless, it is also clear that there is less agreement at an elite policy level, with key disagreements over the timelines before the arrival of super intelligent AI and the impact any moratorium would have if it is ignored by other countries such as China. Even if agreement can be reached, setting up a global body could take many years, while AI progress is moving very fast.

Given these disagreements, a second best measure would be to agree now that, ahead of time, a set of key thresholds that policy makers agree would demonstrate that AI systems are becoming increasingly dangerous, and ideally the policy actions that would take place in response. After a technology is already created it is too easy to retroactively come up with reasons why this capacity is not so impressive or dangerous after all. (If ten years ago, you had told policy makers AI would be able to pass most professional exams, most practical Turing Tests and have solved the protein folding problem, it seems likely that they would have predicted more alarm.)

This could include measures such as:

Demonstration of long term planning and agency
Demonstrating ability to manipulate, persuade and trick humans through text, voice or video
Robots reaching the ability to navigate the physical world, or build more of themselves from scavenged components
AIs developing new biological or other forms of weapon

Given these milestones, it is not hard to understand how AIs would be on the verge of creating significant danger.

The UK's Foundation Model Taskforce should remain focussed on better documenting and communicating the risks from advanced AIs, rather than seeking to build domestic capacity.

Given its budget and political legitimacy within the public sector, the Taskforce can most helpfully:

Continue to work to develop new technical evaluations and risk assessment tools which can be used on existing and new frontier models. Leading AI labs such as Open AI, Anthropic and Google DeepMind are already running their own internal evaluations of risks from deception, increasing agency from models, or the models abilities to be used by hostile actors to build new weapons or commit crime. However, these evaluations remain less than perfect demonstrations of all the risks from AI, particularly as it becomes more advanced. As the technology becomes more widespread, it will become increasingly important for there to be an external overseer to double check and audit models.
Ask every frontier AI lab who operates in the UK to publish a public statement of their plan to reduce the risks from superintelligent AI. This should be annually updated with reports on any new concerns or progress.
Publish a quarterly monitoring report, looking at recent progress a) against the AI risk thresholds agreed at the global summit b) within leading AI labs against their own published safety plan and c) within the field more generally.
Work with ARIA, compile a technical agenda of the unsolved problems in aligning superintelligent AI systems. Fundamentally, ensuring future AI systems are aligned with human interests is a technical research problem - and one we currently do not know how to solve.
Establish a series of prizes for any individual researcher or research body that makes substantial progress against this agenda. While there is increasing focus on AI safety issues, the current level of funding and resources going into improving AI’s capacities is still an order of magnitude times larger than the level of funding and resources going into increasing their capabilities.

In order to make maximum use of its budget and avoid further encouraging race dynamics to new unsafe capacities, the Taskforce should not emphasise building its own house AI model or capacity. Given its budget and timelines, the body is unlikely to be able to realistically compete in building 'a British GPT' - and in any case, most of the safety work that is needed can already be done through existing APIs or collaborations with leading AI labs. At the same time, the rapid development of open source models such as Stable Diffusion or Meta's Llama 2 are reducing the need for a public sector alternative to closed source private models.

Building on the work of the Taskforce, the UK should look to work with aligned governments and AI labs to create a government backed AI Safety Institute, targeted at narrow risks from the development of advanced and superintelligent AIs, rather than the deployment of today’s AI technology.

This could build on the work of Taskforce, giving it official backing to turn today’s voluntary commitments from AI labs and the wider tech industry into real legal commitments with teeth.

Some commentators, such as the Ada Lovelace Institute, have argued that the UK should follow an 'expansive' definition of safety, looking to target risks both from current systems and advanced or superintelligent AI. This is largely the approach of the EU's proposed Artificial Intelligence Act, which sets out a comprehensive framework in which different uses of AI are given varying regulatory requirements based on their perceived level of risk.

It is true that there is some overlap between the two sets of issues, and in particular concerns around alignment, transparency and interpretability. However, while the precautionary principle is clearly appropriate for risks from advanced or superintelligent AI, this is much less clear for the risks from today’s currently existing systems.

New AI technologies are likely to have both positive and negative externalities.In many cases, while not perfect, AI technologies are still a significant improvement in transparency, affordability and fairness than opaque human judgement. There is already some evidence that AI based processes can be fairer.12

Until we have greater experience of the real world impact of today’s AI systems, it is hard to focus regulation on where it can have the largest impact. Neither it is clear that recent similar initiatives such as GDPR or the ePrivacy directive ("the Cookie law") have been entirely successful in having actual real world impacts, and avoiding becoming a checkbox exercise.

To avoid mission creep, the Taskforce and any new government backed AI safety body should remain focussed on risks that are either unique to advanced or superintelligent AI, rather than those centred around today’s generation of AI. As it stands, there already exists a healthy ecosystem of both nonprofit institutions and government bodies that are focussed on these kinds of issues, including the Alan Turing Institute, the Ada Lovelace Institute and the Centre for Data Ethics and Innovation.

Notes

About the Poll

The majority of the results in this paper are from the first two waves of the Public First monthly omnibus.

August Wave: 2nd Aug - 8th Aug 2023 Interview method: Online Survey Population represented: UK Adults Sample size: 2012

September Wave: 5th Sep - 7th Sep 2023 Interview method: Online Survey Population represented: UK Adults Sample size: 2003

Methodology:All results are weighted using Iterative Proportional Fitting, or ’Raking’. The results are weighted by interlocking age & gender, region and social grade to Nationally Representative Proportions

Public First is a member of the BPC and abides by its rules. For more information please contact the Public First Polling Team.

Artwork

Artwork produced by AI, with the following prompts:

Header:"An impressionist art painting of a neural net" (Midjourney V5)

What is the problem?"Bland and white photorealsitic photo of robotic hand playing chess 2k" (Midjourney V5)

Public Opinion:"A sea of protestors and placards. indistinct. too far out to read any placards" (Midjourney V5)

Policy Recommendations: "Photorealistic photo of Bletchley Park from above" (Midjourney V5)

Notes: "A ticking clock. ultra close up" (Midjourney V5)