What’s AGI? No person agrees, and it’s tearing Microsoft and OpenAI apart.

The reported $100 billion profit threshold we mentioned earlier conflates industrial success with cognitive capability, as if a system’s ability to generate revenue says anything meaningful about whether it will probably “think,” “reason,” or “understand” the world like a human.

Sam Altman speaks onstage during The Latest York Times Dealbook Summit 2024 at Jazz at Lincoln Center on December 4, 2024, in Latest York City.


Credit:

Eugene Gologursky via Getty Images


Depending in your definition, we may have already got AGI, or it might be physically not possible to realize. When you define AGI as “AI that performs higher than most humans at most tasks,” then current language models potentially meet that bar for certain kinds of work (which tasks, which humans, what’s “higher”?), but agreement on whether that’s true is way from universal. This says nothing of the even murkier concept of “superintelligence”—one other nebulous term for a hypothetical, god-like intellect to this point beyond human cognition that, like AGI, it defies any solid definition or benchmark.

Given this definitional chaos, researchers have tried to create objective benchmarks to measure progress toward AGI, but these attempts have revealed their very own set of problems.

Why benchmarks keep failing us

The seek for higher AGI benchmarks has produced some interesting alternatives to the Turing Test. The Abstraction and Reasoning Corpus (ARC-AGI), introduced in 2019 by François Chollet, tests whether AI systems can solve novel visual puzzles that require deep and novel analytical reasoning.

“Just about all current AI benchmarks could be solved purely via memorization,” Chollet told Freethink in August 2024. A serious problem with AI benchmarks currently stems from data contamination—when test questions find yourself in training data, models can appear to perform well without truly “understanding” the underlying concepts. Large language models function master imitators, mimicking patterns present in training data, but not all the time originating novel solutions to problems.

But even sophisticated benchmarks like ARC-AGI face a fundamental problem: They’re still trying to cut back intelligence to a rating. And while improved benchmarks are essential for measuring empirical progress in a scientific framework, intelligence is not a single thing you may measure, like height or weight—it’s a posh constellation of abilities that manifest in a different way in numerous contexts. Indeed, we don’t even have an entire functional definition of human intelligence, so defining artificial intelligence by any single benchmark rating is more likely to capture only a small a part of the entire picture.

Related Post

Leave a Reply