Google unveils DiffusionGemma, an AI model that breaks freed from left-to-right processing – Computerworld

It might probably also save users money. Technology analyst Carmi Levy noted that existing pay-per-token monetization models “penalize the usage of lower than optimally efficient AI solutions.”

But DiffusionGemma “could herald a brand new generation of task-defined, efficient solutions that may enable expanded compute capability without draining the operations budget,” he said.

A contrast to left-to-right processing

Built on Google’s Gemma 4 family and its Gemini Diffusion research, DiffusionGemma is a 26B mixture-of-experts (MoE) model designed to maximise text output generation.

It essentially shifts how models use hardware, giving processors a bigger hunk of labor each cycle so it might probably draft full 256-token paragraphs in sequence. This enables the model to generate text as much as 4x faster on GPUs, Google claims. It prompts only 3.8B parameters during inference, and, when quantized, can fit inside 18GB VRAM on high-end consumer GPUs like Nvidia RTX 5090.

Related Post

Leave a Reply