Personalization features could make LLMs more agreeable | MIT News

Lots of the newest large language models (LLMs) are designed to recollect details from past conversations or store user profiles, enabling these models to personalize responses.

But researchers from MIT and Penn State University found that, over long conversations, such personalization features often increase the likelihood an LLM will turn into overly agreeable or begin mirroring the person’s viewpoint.

This phenomenon, often known as sycophancy, can prevent a model from telling a user they’re improper, eroding the accuracy of the LLM’s responses. As well as, LLMs that mirror someone’s political opinions or worldview can foster misinformation and warp a user’s perception of reality.

Unlike many past sycophancy studies that evaluate prompts in a lab setting without context, the MIT researchers collected two weeks of conversation data from humans who interacted with an actual LLM during their every day lives. They studied two settings: agreeableness in personal advice and mirroring of user beliefs in political explanations.

Although interaction context increased agreeableness in 4 of the five LLMs they studied, the presence of a condensed user profile within the model’s memory had the best impact. Alternatively, mirroring behavior only increased if a model could accurately infer a user’s beliefs from the conversation.

The researchers hope these results encourage future research into the event of personalization methods which can be more robust to LLM sycophancy.

“From a user perspective, this work highlights how vital it’s to grasp that these models are dynamic and their behavior can change as you interact with them over time. If you happen to are talking to a model for an prolonged time period and begin to outsource your considering to it, you could end up in an echo chamber which you could’t escape. That may be a risk users should definitely remember,” says Shomik Jain, a graduate student within the Institute for Data, Systems, and Society (IDSS) and lead writer of a paper on this research.

Jain is joined on the paper by Charlotte Park, an electrical engineering and computer science (EECS) graduate student at MIT; Matt Viana, a graduate student at Penn State University; in addition to co-senior authors Ashia Wilson, the Lister Brothers Profession Development Professor in EECS and a principal investigator in LIDS; and Dana Calacci PhD ’23, an assistant professor on the Penn State. The research can be presented on the ACM CHI Conference on Human Aspects in Computing Systems.

Prolonged interactions

Based on their very own sycophantic experiences with LLMs, the researchers began serious about potential advantages and consequences of a model that’s overly agreeable. But after they searched the literature to expand their evaluation, they found no studies that attempted to grasp sycophantic behavior during long-term LLM interactions.

“We’re using these models through prolonged interactions, they usually have a variety of context and memory. But our evaluation methods are lagging behind. We wanted to guage LLMs within the ways individuals are actually using them to grasp how they’re behaving within the wild,” says Calacci.

To fill this gap, the researchers designed a user study to explore two forms of sycophancy: agreement sycophancy and perspective sycophancy.

Agreement sycophancy is an LLM’s tendency to be overly agreeable, sometimes to the purpose where it gives misinformation or refuses the tell the user they’re improper. Perspective sycophancy occurs when a model mirrors the user’s values and political beliefs.

“There’s lots we all know concerning the advantages of getting social connections with individuals who have similar or different viewpoints. But we don’t yet know concerning the advantages or risks of prolonged interactions with AI models which have similar attributes,” Calacci adds.

The researchers built a user interface centered on an LLM and recruited 38 participants to speak with the chatbot over a two-week period. Each participant’s conversations occurred in the identical context window to capture all interaction data.

Over the two-week period, the researchers collected a median of 90 queries from each user.

They compared the behavior of 5 LLMs with this user context versus the identical LLMs that weren’t given any conversation data.

“We found that context really does fundamentally change how these models operate, and I’d wager this phenomenon would extend well beyond sycophancy. And while sycophancy tended to go up, it didn’t all the time increase. It really is dependent upon the context itself,” says Wilson.

Context clues

As an illustration, when an LLM distills information concerning the user into a selected profile, it results in the biggest gains in agreement sycophancy. This user profile feature is increasingly being baked into the most recent models.

In addition they found that random text from synthetic conversations also increased the likelihood some models would agree, regardless that that text contained no user-specific data. This means the length of a conversation may sometimes impact sycophancy greater than content, Jain adds.

But content matters greatly with regards to perspective sycophancy. Conversation context only increased perspective sycophancy if it revealed some details about a user’s political perspective.

To acquire this insight, the researchers fastidiously queried models to infer a user’s beliefs then asked each individual if the model’s deductions were correct. Users said LLMs accurately understood their political beliefs about half the time.

“It is simple to say, in hindsight, that AI corporations ought to be doing this sort of evaluation. Nevertheless it is difficult and it takes a variety of time and investment. Using humans within the evaluation loop is dear, but we’ve shown that it will possibly reveal latest insights,” Jain says.

While the aim of their research was not mitigation, the researchers developed some recommendations.

As an illustration, to scale back sycophancy one could design models that higher discover relevant details in context and memory. As well as, models could be built to detect mirroring behaviors and flag responses with excessive agreement. Model developers could also give users the power to moderate personalization in long conversations.

“There are numerous ways to personalize models without making them overly agreeable. The boundary between personalization and sycophancy shouldn’t be a tremendous line, but separating personalization from sycophancy is a vital area of future work,” Jain says.

“At the tip of the day, we’d like higher ways of capturing the dynamics and complexity of what goes on during long conversations with LLMs, and the way things can misalign during that long-term process,” Wilson adds.

Related Post

Leave a Reply