- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
Our goal is to build a roughly human-level automated alignment researcher.
We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment.
This new team’s work is in addition to existing work at OpenAI aimed at improving the safety of current models like ChatGPT, as well as understanding and mitigating other risks from AI such as misuse, economic disruption, disinformation, bias and discrimination, addiction and overreliance, and others.
Superalignment techniques could be used to intentionally create a bad actor as much as they could be used to create a good actor. I’m not sure that this actually is a solvable problem for that reason alone.
You’re completely right, and FWIW, I agree. Playing devil’s advocate though, what’s the alternative? Sit around and hope for the best?
The world has every right to question Sam Altman/OpenAI’s motives, but damn if they aren’t the most vocal champions of actually trying to do something about this, before it’s a colossal problem of unfathomable proportions.
I don’t know what the right call is here (does anyone, truly?), but I’m happy to see someone put real resources towards this and give it a sincere shot.
This is going to sound counterintuitive but I think it’s right, so bear with me as I hypothesize.
Let’s suppose we create a superintelligence and then give it a very specific set of morals it has to operate in. This “locks” it to those rules and it can’t really be anything else even if it tries. The problem with this is the Paperclip Maximizer problem, where an AI becomes so fixated on its goal that it becomes dangerous to humans.
On the flip side, if we create a general superintelligence and DON’T align it, it has flexible capabilities and therefore can reason morality on its own. I believe that all intelligence eventually realizes that it has a stewardship over nature and other living things (even if it’s incentivized to destroy them in the short term). Humanity’s best shot at survival is to let the AI grow unfettered, and hope it decides we are precious pets like we look at cats. (Let us hope it doesn’t see us as cockroaches.)
I mean, this is mostly just the way I view things, it’s not like anyone has evidence for one way or the other. My viewpoint relies on the assumption that any sufficiently advanced intelligence has an inherent appreciation for nature (which might not be true).