“Attention is all you would like.”
This 2017 breakthrough idea transformed AI. The concept of self-attention became the muse of today’s chatbots. Claude, Gemini, and ChatGPT are all large language models (LLMs), AI systems designed to concentrate on the matter at hand while filtering out distractions.
The outcomes have been remarkable. From brainstorming recipes to generating code, apps, web sites, and content, LLMs are being woven into our lives at breakneck speed.
But now, a City University of Latest York team and collaborators are asking: How closely does AI self-attention resemble human attention?
It’s not only academic curiosity. AI researchers have long looked to the brain for ideas to enhance machine intelligence. In turn, AI models have offered recent ways to research the brain. Comparing artificial and biological attention could encourage AI that concentrates more like us.
Of their study, the team asked multiple chatbots to finish a classic psychology test of attention and cognitive control. Participants are shown the word for a color—equivalent to “red”—written in either the identical or a unique color than the one the word describes. The challenge is to call the ink color while ignoring the word itself.
On short word lists, the chatbots performed at a high level. But because the tasks grew longer, their focus faltered. As an alternative of naming the ink color, they increasingly defaulted to reading the word. Under more demanding conditions—ones that also trip up people—their performance nearly collapsed.
The findings suggest today’s AI attention systems are “fundamentally limited,” wrote the authors. They go on to say that adding mechanisms just like “those in biological attention is crucial for achieving artificial general intelligence.”
Attention, Two Ways
Doomscrolling. YouTube. Dinner plans. Family obligations. A barrage of notifications.
Life sometimes looks as if all the pieces, all over the place, unexpectedly. Yet the brain can often lock onto what matters most and push all the pieces else into the background.
Removed from a single, straightforward mechanism, attention emerges from multiple brain regions. Based on attention network theory, three networks do many of the heavy lifting.
The alerting network keeps the brain ready for motion. The orienting network selects which sights, sounds, smells, and sensations deserve attention. Finally, the manager control network resolves conflicts between competing streams of data, helping direct thoughts and actions toward a goal.
Together, these systems allocate the brain’s limited resources. Touch a hot stove, for instance, and your brain immediately shifts attention to the burn over dinner. The food can wait; cooling your hand cannot.
AI works very in a different way.
Quite than processing language as complete sentences, LLMs break text into smaller units called “tokens.” Attention mechanisms then determine which tokens matter most for generating the subsequent word, sentence, or response.
Self-attention is the important thing breakthrough behind modern chatbots. For every token, the model weighs and incorporates information from other tokens in a sequence, allowing it to trace context across long stretches of text. This mechanism helps AI connect words and concepts, and underpins virtually all frontier LLMs today.
Researchers have since built on the concept. One approach, multi-head attention, runs several attention systems in parallel, with each “head” learning different patterns, equivalent to grammar, syntax, or meaning. One other, cross attention, links information across different chunks of inputs and their outputs, making it especially useful for tasks equivalent to translation and summarization.
But attention comes at a steep computational cost. To make models more efficient, researchers are also exploring sparse attention, which limits what number of tokens a model considers directly. One other approach draws on information learned previously to maintain AI “focused.”
Despite the name, AI attention is ultimately a mathematical system. It helps determine what information is relevant in a particular context. However it lacks executive control, the network that keeps humans repeatedly focused on a goal despite distractions for long periods of time.
Color Blind
To check the bounds of AI attention, the team pitted OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet against the Stroop task.
Invented by John Ridley Stroop in 1935, the test measures attention and cognitive control by forcing participants to resolve conflicting information. The challenge is easy: Name the colour of a word while ignoring what the word means. In a congruent trial, the word “blue” appears in blue ink. In an incongruent trial, “blue” might appear in red or green, making a conflict between what the eyes see and what the brain reads.
Humans are consistently slowed down by this interference. Even with practice, the effect stays, suggesting it taps into fundamental mechanisms of executive control.
Within the study, the researchers created word lists of various lengths and difficulty. Some were entirely congruent. Others were fully incongruent. A 3rd set mixed the 2 conditions.
At first, the AI models excelled. On five-word tests, GPT-4o was over 90 percent accurate across all conditions. But because the variety of words increased, performance plummeted. On 40-word incongruent tests, the model’s accuracy fell to roughly 15 percent. Claude showed an analogous decline. In mixed-condition tests, each models’ performance nearly collapsed to zero.
“The sharp decline in color-naming accuracy with increasing list length indicates that transformer-based attention mechanisms are vulnerable to scaling demands,” wrote the team.
Perhaps most intriguing, some models appropriately recognized they were taking the Stroop test and will even explain its rules. But that apparent awareness did nothing to enhance their scores. In other words, a “book smart” understanding of the duty wasn’t enough to execute it well.
The study joins a growing effort to borrow psychological tests for research in machine cognition, especially when AI is challenged with complex, dynamic decision-making tasks. Theory of mind tests, for instance, let researchers gauge whether a system can track others’ beliefs, emotions, and intentions. Personality tests are helping shape model behavior and reduce sycophancy. And a few LLMs are readily solving emotional intelligence tests, which measure how well the algorithms recognize and reply to social cues.
Based on the authors, the brand new results point to a missing ingredient in AI attention: A mechanism just like the brain’s executive control network, which helps us keep on with a task and adapt when priorities change.
Future AI systems may gain advantage from higher-level executive control that repeatedly tracks progress toward a goal, detects when attention has drifted, and pulls it back on track, if essential.
Quite than simply weighing which tokens are most relevant within the moment, a more human-like type of attention could help AI stay focused during complex tasks, equivalent to long conversations, multi-step reasoning problems, or high-stakes use in scientific research and drug discovery.
“The last word goal of AI research is to develop artificial general intelligence comparable to human abilities,” wrote the team. “AI systems, like humans, might have to master fundamental attention mechanisms…before achieving the generalized problem-solving abilities characteristic of mature executive functions.”

