Despite its impressive output, generative AI doesn’t have a coherent understanding of the world

Large language models can do impressive things, like write poetry or generate viable computer programs, although these models are trained to predict words that come next in a chunk of text.

Such surprising capabilities could make it seem to be the models are implicitly learning some general truths concerning the world.

But that may not necessarily the case, based on a brand new study. The researchers found that a well-liked style of generative AI model can provide turn-by-turn driving directions in Recent York City with near-perfect accuracy — without having formed an accurate internal map of the town.

Despite the model’s uncanny ability to navigate effectively, when the researchers closed some streets and added detours, its performance plummeted.

Once they dug deeper, the researchers found that the Recent York maps the model implicitly generated had many nonexistent streets curving between the grid and connecting far-off intersections.

This might have serious implications for generative AI models deployed in the actual world, since a model that appears to be performing well in a single context might break down if the duty or environment barely changes.

“One hope is that, because LLMs can accomplish all these amazing things in language, possibly we could use these same tools in other parts of science, as well. However the query of whether LLMs are learning coherent world models may be very vital if we wish to make use of these techniques to make recent discoveries,” says senior writer Ashesh Rambachan, assistant professor of economics and a principal investigator within the MIT Laboratory for Information and Decision Systems (LIDS).

Rambachan is joined on a paper concerning the work by lead writer Keyon Vafa, a postdoc at Harvard University; Justin Y. Chen, an electrical engineering and computer science (EECS) graduate student at MIT; Jon Kleinberg, Tisch University Professor of Computer Science and Information Science at Cornell University; and Sendhil Mullainathan, an MIT professor within the departments of EECS and of Economics, and a member of LIDS. The research might be presented on the Conference on Neural Information Processing Systems.

Recent metrics

The researchers focused on a style of generative AI model often called a transformer, which forms the backbone of LLMs like GPT-4. Transformers are trained on a large amount of language-based data to predict the following token in a sequence, reminiscent of the following word in a sentence.

But when scientists want to find out whether an LLM has formed an accurate model of the world, measuring the accuracy of its predictions doesn’t go far enough, the researchers say.

For instance, they found that a transformer can predict valid moves in a game of Connect 4 nearly each time without understanding any of the principles.

So, the team developed two recent metrics that may test a transformer’s world model. The researchers focused their evaluations on a category of problems called deterministic finite automations, or DFAs.

A DFA is an issue with a sequence of states, like intersections one must traverse to succeed in a destination, and a concrete way of describing the principles one must follow along the way in which.

They selected two problems to formulate as DFAs: navigating on streets in Recent York City and playing the board game Othello.

“We wanted test beds where we all know what the world model is. Now, we are able to rigorously take into consideration what it means to get well that world model,” Vafa explains.

The primary metric they developed, called sequence distinction, says a model has formed a coherent world model it if sees two different states, like two different Othello boards, and recognizes how they’re different. Sequences, that’s, ordered lists of information points, are what transformers use to generate outputs.

The second metric, called sequence compression, says a transformer with a coherent world model should know that two similar states, like two similar Othello boards, have the identical sequence of possible next steps.

They used these metrics to check two common classes of transformers, one which is trained on data generated from randomly produced sequences and the opposite on data generated by following strategies.

Incoherent world models

Surprisingly, the researchers found that transformers which made selections randomly formed more accurate world models, perhaps because they saw a greater diversity of potential next steps during training.

“In Othello, should you see two random computers playing slightly than championship players, in theory you’d see the complete set of possible moves, even the bad moves championship players would not make,” Vafa explains.

Regardless that the transformers generated accurate directions and valid Othello moves in nearly every instance, the 2 metrics revealed that just one generated a coherent world model for Othello moves, and none performed well at forming coherent world models within the wayfinding example.

The researchers demonstrated the implications of this by adding detours to the map of Recent York City, which caused all of the navigation models to fail.

“I used to be surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100% to only 67 percent,” Vafa says.

Once they recovered the town maps the models generated, they looked like an imagined Recent York City with tons of of streets crisscrossing overlaid on top of the grid. The maps often contained random flyovers above other streets or multiple streets with unimaginable orientations.

These results show that transformers can perform surprisingly well at certain tasks without understanding the principles. If scientists need to construct LLMs that may capture accurate world models, they should take a distinct approach, the researchers say.

“Often, we see these models do impressive things and think they should have understood something concerning the world. I hope we are able to persuade people who it is a query to think very fastidiously about, and we haven’t got to depend on our own intuitions to reply it,” says Rambachan.

In the longer term, the researchers need to tackle a more diverse set of problems, reminiscent of those where some rules are only partially known. Additionally they need to apply their evaluation metrics to real-world, scientific problems.

This work is funded, partly, by the Harvard Data Science Initiative, a National Science Foundation Graduate Research Fellowship, a Vannevar Bush Faculty Fellowship, a Simons Collaboration grant, and a grant from the MacArthur Foundation.