Change the repository type filter
All
Repositories list
5 repositories
safe-reward
Publica prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiationhoneypot
Publica project to detect environment tampering on the part of an agentmulligan
Publica library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAINgene-drive
Publiclife-span
Public