Patronus AI lands $50M to construct ‘digital worlds’ that stress-test AI agents

AI agents have gotten more sophisticated. They’re evolving from answering inquiries to autonomously executing multi-step complex tasks.

But before these agents may be trusted to book trips or conduct financial evaluation on behalf of users, model providers and the startups constructing such agents wish to make sure that they perform reliably across an unlimited range of scenarios.

AI labs often use benchmarks to indicate off their model’s prowess, but a high rating, even on an agent-oriented benchmark, doesn’t actually prove that an AI can accomplish various complex, real-world jobs accurately.

Patronus AI, a startup founded in 2023 by former Meta AI researchers Anand Kannappan and Rebecca Qian, helps model makers and firms fine-tune models to just do that by constructing simulated digital environments through which to guage the agents’ performance.

The San Francisco-based startup have to be solving a very important problem. Virtually every frontier AI lab and lots of emerging startups at the moment are customers, in line with Glenn Solomon, a managing director at Notable Capital, who describes demand for the corporate’s simulated environments as nearly insatiable.

Patronus’ revenue has grown 15-fold over the past yr, fueling significant investor interest. On Thursday, the corporate announced a $50 million Series B round led by Greenfield Partners, with participation from Notable Capital, Lightspeed, Datadog, and Samsung. The round brings the corporate’s total funding to $70 million.

Patronus uses what it calls “digital world models” to create replicas of internet sites and internal systems. In these environments, agents are stress-tested after training using reinforcement learning, which iteratively rewards successful task completion and penalizes errors.

AI labs see great value in these digital simulations because they offer agents a likelihood to try different, sometimes unpredictable, scenarios. The corporate compares its approach to how Waymo trained autonomous cars by first constructing synthetic worlds to check vehicles against rare hazards, comparable to severe weather or a toddler running after a ball.

The difference with AI agents is that they have an inclination to take shortcuts, which implies they fail to finish the duty accurately. “Patronus is absolutely good at spotting the hacks and ensuring they’re holding the models accountable,” Solomon said.

Patronus is currently providing its simulated digital worlds for software engineering and finance, but these are only the beginning, in line with Kannappan.

“Today we’re very focused on the issues which can be verifiable, so the issues you could immediately check and confirm, but there are a ton more areas which can be very non-verifiable or very hard to confirm,” he said.

Simply because these processes are verifiable doesn’t mean they’re easy. “We would like to have the opportunity to truly create the environment through which you’ll be able to operate an agent that may run for 10 hours or 10 days or 10 weeks,” Kannappan said.

As for rivals, Patronus believes it’s primarily competing against the inner teams AI labs have already built to guage agent behavior. While human-data firms like Mercor and Surge help model makers with reinforcement learning, Patronus operates in another way by evaluating how agents behave with none human involvement.

While you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

Related Post

Leave a Reply