Meta’s compute grab continues with agreement to deploy tens of hundreds of thousands of AWS Graviton cores

Meta is constant its compute grab because the agentic AI race accelerates to a sprint.

Today, the corporate announced a partnership with Amazon Web Services (AWS) that can bring “tens of hundreds of thousands” of AWS Graviton5 cores (one chip comprises 192 cores) into its compute portfolio, with the choice to expand as its AI capabilities grow. This may make the Llama builder one among the most important Graviton customers on this planet.

The move builds on Meta’s expansive partnerships with nearly every chip and compute provider within the business. It’s working with Nvidia, Arm, and AMD, in addition to constructing its own internal training and inference accelerator chip.

“It feels very difficult to maintain track of what Meta is doing, with all of those chip deals and announcements around in-house development,” said Matt Kimball, VP and principal analyst at Moor Insights & Strategy. This makes for “exciting times that tell us just how incredibly priceless silicon is straight away.”

Controlling the system, not only scale

Graphics processing units (GPUs) are essential for giant language model (LLM) training, but agentic AI requires an entire recent workload capability. CPUs like Graviton5 are rising to this challenge, supporting intensive workloads like real-time reasoning, multi-step tasks, frontier model training, code generation, and deep research.

AWS says Graviton5 has the flexibility to handle “billions of interactions” and to coordinate complex, multi-stage agentic tasks. It’s built on the AWS Nitro System to support high performance, availability, and security.

“This is basically about control of the AI system, not only scale,” said Kimball. As AI evolves toward persistent, agentic workloads, the role of the CPU becomes “quite meaningful;” it serves because the control plane, handling orchestration, managing memory, scheduling, and other intensive tasks across accelerators.

“This is particularly true in agentic environments, where the workloads will probably be less linear and more stateful,” he identified. So, ensuring a supply of those resources just is sensible.

Reflecting Meta’s diversified approach to hardware

The agreement builds on Meta’s long-standing partnership with AWS, but in addition reflects what the corporate calls its “diversified approach” to infrastructure. “No single chip architecture can efficiently serve every workload,” the corporate emphasized.

Proving the purpose, Meta recently announced 4 recent generations of its MTIA training and inference accelerator chip and signed a massive deal with AMD to tap into 6GW value of CPUs and AI accelerators. It also entered right into a multi-year partnership with Nvidia to access hundreds of thousands of Blackwell and Rubin GPUs and to integrate Nvidia Spectrum-X Ethernet switches into its platform, and was also one among Arm’s first major CPU customers.

Within the wake of all this, Nabeel Sherif, a principal advisory director at Info-Tech Research Group, posed the burning query: “What are they going to do with all this capability?”

Primarily it’s going to support Meta’s internal experimentation and innovation, he said, but it surely also lays the groundwork and provides the capability for Meta to supply its own agentic AI services, as an example, its Llama AI model as an API, to the market.

“What those [services] will appear like and what platforms and tools they’ll use, in addition to what guardrails they’ll provide to users, continues to be unclear, but it surely’s going to be interesting to see it develop,” said Sherif.

The expanded capability will enable a diversity of use cases and experimentation across various architectures and platforms, he said. Meta could have many options, and access to provide in an environment currently characterised not only by a wide range of latest CPU approaches, but by significant supply chain constraints. The AWS deal ought to be viewed as a complement to its partnerships and investments in other platforms like ARM, Nvidia, and AMD.

Kimball agreed that the move is “most definitely additive,” not a substitute or substitution. Meta isn’t moving off GPUs or accelerators, it’s constructing around them. “That is about assembling a heterogeneous system, not picking a single winner,” he said. “In actual fact, I believe for many, heterogeneity is critical to long run success.”

Nvidia still dominates training and loads of inference, while AMD is becoming “increasingly more relevant at scale,” Kimball noted. Arm, meanwhile, whether through CPU, custom silicon or other efforts, gives Meta architectural control, and Graviton5 matches into that blend as a “cost- and efficiency-optimized general-purpose compute layer.”

A matter of strategy

The more interesting query is around strategy: Does this signal Meta is becoming a compute provider? Kimball doesn’t think so, noting that it’s likely the corporate isn’t seeking to directly compete with hyperscalers as a general-purpose cloud. “That is more about vertical integration of their very own AI stack,” he said.

The move gives them the flexibility to support internal workloads more efficiently, in addition to providing the infrastructure foundation to reveal more of that capability externally, whether through APIs, partnerships, or other means, he said.

And there’s a price dynamic here, too, Kimball noted. As inference becomes persistent, especially with agentic systems, economics shift away from peak floating-point operations per second (FLOPS) (a measure of compute performance) and toward sustained efficiency and total cost of ownership (TCO).

CPUs like Graviton5 are well positioned for the parts of that workload that don’t require accelerators, but still have to run constantly. “At Meta’s scale, even small efficiency gains per workload compound quickly,” Kimball identified.

For developers and enterprise IT, the signal is pretty clear, he noted: The AI stack is getting more heterogeneous, not less so. Enterprises are going to see tighter coupling between CPUs, GPUs, and specialized accelerators, with workloads increasingly split across them based on behavior (prefill versus decode, stateless versus stateful, burst versus persistent).

“The implication is that infrastructure decisions should develop into more workload-aware,” said Kimball. “It’s less about ‘which cloud?’ and more about ‘where does this specific a part of the applying run most efficiently?’”

This text originally appeared on NetworkWorld.

Related Post

Leave a Reply