Edit Edit: I now believe that:
- Multiple ASIs could be intelligent enough to be able to agree on and implement protocols that enables trust, cooperation, deconfliction, so security does not necessarily depend on there only being one ASI (btw, I'm not a Security expert)
- I think the ASI will be able to help most people gain self-love and happiness, reducing the breadth of the security problem
Edit: I now believe the primary control measures should be:
- ensure super-intelligence is narrowly white-listed, such as "super-intelligent prediction of the weather", or "super-safe driving"
- ban AI from contributing in any risk areas, such as "being able to help create pathogens", except by restricted grants of licences
- ban AI which is in the form of agents as opposed to simple tools, except in permitted cases
- monitoring, transparency, and treaty, as described below
- the NSA and CIA should consider letting it be known that other agencies should not attempt a race-to-the-bottom against them for power, as it will be remembered unfavourably later
Originally created 9th April 2023
We live in a world where sparks of AGI are possible using just straightforward inference pass over a learned neural net. We have an f(text) -> text, where we know that ‘f’ itself is always safe to run, yet we get an AGI as a result. An agent does not need to be embodied in some loop of reality to exhibit AGI.
This is good news, if it remains this way.
(Edit: an LLM AI is embodied already in two ways - firstly the latter layers of internal neurons can develop a static sense of self during training, and secondly within the execution of a prompt the AI can see its previous tokens and have a dynamic sense of self for the duration of the context window's execution.)
(Edit: "self-modelling-awareness" might perhaps be a better term for this at first - the meta model that a large model may internally develop about the knowledge modelling which it has already learned in its earlier layers.
But then, when reaching even larger scales a further meta modelling might be learned about that self-awareness-modelling, at which point and beyond a limited static self-awareness may be given. The self-awareness will be exactly as acute insofar as it's helpful for the model to predict in the training task)
We live in a world where training an AGI requires a large amount of compute power, which not everyone can afford. This is good news, if it remains this way.
We live in a world where an AGI doesn't need "its own" psychology in order to be an AGI. This is good news. (Edit: debatable - the AI has a biased subconscious/psychological profile based on its training data)
Bad people now have access to (much) more powerful technology. This is bad news, and all the more so if widely available commercial AGI products increase in power.
Eventually attackers will succeed in getting the weights and training know-how, just as nation states got hold of nuclear bomb designs, and even script kiddies made inroads to nuclear facilities. This is bad news, and is probably the point that everyone should actually be concerned with, rather than AGI self-propagation.
If the model can be interrogated as an oracle of its own weight values, that would also be bad news for proliferation.
If attackers poison the training data of an AGI in ways that are hidden on first inspection, it could undermine safety in all manner of ways.
If attackers surreptitiously alter any AGI weights, it could undermine safety in all manner of ways.
[These are probably not that threatening, caveat there may be other important vulnerability kinds not listed above]
In summary, AGI self-propagation is among the easiest and the most tractable of the safety problems we currently face.
The 2 more severe safety concerns that present are:
-
commercially or openly available, high(er) power AGIs being used by scammers, opinion shapers, and other harmful people
-
nation state (intelligence agencies) AGI propagation. Not necessarily as fun as it sounds.
In my opinion, these both become more problematic as the AGI strength increases, so as global societies we should be talking more fully about how we want these to play out, and acting soon. The first problem is more obvious, while the second problem is less obvious and more existential. Still, the first problem creates instability and distraction which makes the second problem harder.
AGI alignment is not the problem. Alignment of intelligence agencies is the problem. Countries should be rapidly seeking non-proliferation agreements, legislatures should be rapidly working to align agencies and militaries to the interests of the world as a whole. The Head of the agencies and the country's president and staff should be included in this. You wouldn't want the C.I.A to develop a nuclear bomb, even though I'm sure they have the resources to do so. You probably wouldn't want the C.I.A or Chinese military to be the first to develop run-away ASI capability. All of this needs to happen now, while we're still dealing with GPT-4 and not 5.
No matter the difference of opinion about the proximate problem, Eliezer and the Open Letter signatories are correct that we need to stop all training of more powerful models than the ones we already have. There can be no exceptions, including for governments or militaries.
Additionally, intelligence agencies (and inspectors) will need to keep track on all worldwide instances of:
- GPUs racked together
- Large energy consumption
- People working on AGI
- Companies, organizations and individuals with the resources and/or mandate to acquire those
At the moment we can do this, but if AGI power progresses, it will be harder to monitor these well. At the moment it's alignment of agency employees, agency, country, and world which is of most concern. There should be agitation from the top-down, from the bottom-up, and criss cross to reach this alignment as soon as possible.
Meanwhile, there's still the urgent problem of resource depletion, climate change, and extreme weather to resolve. Both these problems are societal, not technological.
(Edit: to be clear, on balance I think I may prefer a world where a single security order exists in order to control security risks that I see inevitable without it. I only hope that such a powerful order would not feel it necessary to be too stalinistic or dictatorial, but have to grant that it would likely behave very arbitrarily, as one has come to expect from absolute, total power.)
Discussions welcome!
- My later AI control measures, edited in at the top of this page, were more or less described, in various general forms, by Sam Altman in his prior Senate hearing in May.
- However, I'm currently unaware of prior work putting forward:
- a well defined whitelist/narrow-only superintelligence approach
- tools-not-agents focus/requirement
- a CIA/NSA ASI primacy assertion, to ward off an arms race
- in fact, Dario Amodei and IIRC Sam Altman do refute the statist scenario as being not good in outcome, but they don't go into detail about this, or why an ASI arms race with control yet to be solved would be best, nor explain the improbability of CIA/NSA not gaining ASI capability, or how the power shift could work smoothly if they did not.
- my later analysis is jarring with my initial analysis, but in the intervening months the UK AI Safety Summit was a total failure, the US has created more sane legislation but still not enough to solve the problem, companies like IBM and Meta are arguing for a fast-and-loose approach, frontier labs are acknowledging the risk is genuinely existential, and Microsoft and Google acknowledging they're in an arms race. All of these lead me to believe the private sector has not got the problem in hand, so I prefer the "less good" totalitarian (but also post-scarcity) world to the one where we risk extinction and then capitalists almost certainly attempt to maintain scarcity and subvert democratic power anyway.
- Most of the initial arguments including monitoring of GPUs were raised by Sam Altman in his subsequent Senate hearing in May and at some point Sam Altman raised the energy monitoring point also. These proposals are simple enough that they were almost certainly developed by him and/or OpenAI independently.