Jan Leike, a number one AI researcher who earlier this month resigned from OpenAI before publicly criticizing the corporate’s approach to AI safety, has joined OpenAI rival Anthropic to guide a brand new “superalignment” team.
In a post on X, Leike said that his team at Anthropic will concentrate on various elements of AI safety and security, specifically “scalable oversight,” “weak-to-strong generalization” and automatic alignment research.
A source aware of the matter tells TechCrunch that Leike will report on to Jared Kaplan, Anthropic’s chief science officer, and that Anthropic researchers currently working on scalable oversight — techniques to manage large-scale AI’s behavior in predictable and desirable ways — will move to report back to Leike as Leike’s team spins up.
In some ways, Leike’s team sounds similar in mission to OpenAI’s recently-dissolved Superalignment team. The Superalignment team, which Leike co-led, had the ambitious goal of solving the core technical challenges of controlling superintelligent AI in the following 4 years, but often found itself hamstrung by OpenAI’s leadership.
Anthropic has often attempted to position itself as more safety-focused than OpenAI.
Anthropic’s CEO, Dario Amodei, was once the VP of research at OpenAI, and reportedly split with OpenAI after a disagreement over the corporate’s direction — namely OpenAI’s growing business focus. Amodei brought with him plenty of ex-OpenAI employees to launch Anthropic, including OpenAI’s former policy lead Jack Clark.