The integrity behind ethical AI
Loading...
The artificial intelligence industry was jolted May 22 when a leading company, Anthropic, announced that its latest model resorted to self-preservation in a test run. Much of the shock was simply that the new digital assistant, Claude Opus 4, used blackmail against a fictional character in a particular scenario in order to avoid being shut down. It was like a tingling plot twist in a sci-fi flick.
Yet just as jolting was that Anthropic was so open about Claude鈥檚 failure to operate with a level of moral intelligence that its inventors sought to build into it.
The transparency was intentional. Anthropic foresees the reality of ethical AI as dependent on the values of both researchers and users who demand qualities like transparency and trust in AI. The company鈥檚 motto for training AI systems is 鈥渉elpful, honest, and harmless.鈥 And it shares its safety standards and results of new models.
鈥淲e want to make sure that AI improves for everybody, that we are putting pressure on all the labs to increase that in a safe way,鈥 Michael Gerstenhaber, Anthropic鈥檚 AI platform product lead, told Fortune.
Some AI experts predict that the race to produce the best in AI performance will be won by those who also build the best protections of individual rights and ethical values 鈥 and who can also collaborate on such safeguards.
鈥淚t鈥檚 important to discuss what a good world with powerful AI could look like, while doing our best to avoid the above pitfalls,鈥 wrote Anthropic鈥檚 chief operating officer, Dario Amodei, in an essay last year titled 鈥淢achines of Loving Grace.鈥
鈥淢any of the implications of powerful AI are adversarial or dangerous, but at the end of it all, there has to be something we鈥檙e fighting for ... something to rally people to rise above their squabbles and confront the challenges ahead.
He added: 鈥淔ear is one kind of motivator, but it鈥檚 not enough: we need hope as well.鈥
鈥淏asic human intuitions of fairness, cooperation, curiosity, and autonomy are ... cumulative in a way that our more destructive impulses often aren鈥檛.鈥
Like a rocket launch gone haywire, the test run for Claude Opus 4 was an eye-opener. Yet the critical thinking and everyday virtues of humans were able to catch the problem. Such skills and integrity keep humans in charge and able to steer AI toward an intelligence as good as its creators.