In his 1927 paper, “A law of comparative judgment,” the American psychologist L. L. Thurstone proposed that when people select one option amongst multiple alternatives, they’re picking the one which has the best value to them, although they can’t assign a selected number to that alternative.
Thurstone was a pioneer of “psychometrics” — a field built upon the premise that mental processes, which we cannot see, can nevertheless be measured and quantified. His 1927 paper laid the groundwork for what at the moment are called random utility models, which offer a mathematical framework for describing human preferences — information that might be relied upon, in turn, to make predictions about various hypothetical situations.
Random utility models (RUMs) are so named because they assess the “utility,” or profit, that might be obtained from a given alternative — similar to deciding which book to read first among the many stack of novels you brought back from the library. “These models are inherently random,” explains Gabriele Farina, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science (EECS) and principal investigator on the Laboratory for Information and Decision Systems (LIDS), “because individuals are different. Everyone has their very own preferences, and even those preferences can vary on occasion.” For instance, someone who normally picks coffee over tea within the morning, and prefers tea after dinner, may, upon occasion, mix up that order entirely.
RUMs, to make certain, are continuously used inside government and industry in situations of far greater consequence than the collection of a hot (or iced) beverage. The models routinely facilitate predictions regarding what people will elect to do in so-called counterfactual (“what-if”) scenarios similar to: How will they get to work or school if a significant thoroughfare is shut down for construction? What routes and modes of transport will they take? Or, if a city suddenly receives a windfall of $20 million, how should those funds be disbursed to maximise the common good?
On condition that RUMs have been with us for nearly 100 years, growing in sophistication over time, one may think that, at this stage, there can be little room for improvement. That, nonetheless, will not be the case.
A paper presented in April on the International Conference on Learning Representations in Rio de Janeiro, Brazil, uncovered basic facts that show there’s rather more to be gleaned from these models than had traditionally been supposed. The paper was authored by Yeshwanth Cherapanamjeri, a former MIT postdoc now based at Nanyang Technological University in Singapore; Farina, also core faculty in MIT’s Operations Research Center (ORC); Constantinos Daskalakis, the Avanessians Professor of Computer Science at MIT and a member of MIT’s Computer Science and Artificial Intelligence Laboratory; and Sobhan Mohammadpour, an MIT PhD student in computer science based at LIDS and EECS.
The group’s findings stem, partially, from a deficiency in the best way RUMs are commonly estimated in practice, which has continued because the days of Thurstone. The info upon which the models are estimated have been largely drawn from so-called pairwise-comparisons: In a alternative between items A and B — whether it pertains to movies on Netflix, competing products on Amazon.com, news stories posted on Google, and so forth — which one would you choose? One reason this approach has been so pervasive, explains Daskalakis, is that “assigning a precise numerical rating, similar to 4.37, to the profit you get from a single item could be very hard. Whereas comparing two things, and deciding which one you value more highly, is cognitively much easier to do.” But therein lies the rub, he adds. “With this manner of assessing people’s preferences, taking a look at just two things at a time, it’s unattainable to search out correlations between the various selections.”
The usual way of applying RUMs assumes that the utilities derived from A and B are independent, but they could, in actual fact, be linked, and that might be essential to know. If someone campaigning for elective office finds out that a possible voter favors gun control, as an example, there’s an inexpensive likelihood that very same person also favors government-sponsored child care. Similarly, a fan of independent movies may additionally be a fan of foreign movies, but less keen about Hollywood motion blockbusters. “If a digital platform has a blind eye to the existence of such correlations, it would not have the option to estimate preferences very accurately,” Daskalakis notes. “And if Netflix usually shows you an assortment of flicks you don’t care about, you may log out and cancel your subscription.”
The MIT team proved that it’s unattainable to get details about correlations from two-way comparisons alone. Correlations might be discerned, nonetheless, when large numbers of individuals rate three alternatives of their order of preference. The identical information can be obtained from a mixture of best-of-three and best-of-two selections. In practice, Mohammadpour explains, “you’ll get a bunch of individuals to rank three items. You would then utilize the tactic we developed for merging those individual results into one big model that may provide us with the massive picture.”
Their research effort, in response to Farina, is concentrated on the computational side of RUMs, devising algorithms that may extract preference information and determining how much data is required to achieve this or, equivalently, what number of experiments must be run. The excellent news, he says, is that efficient algorithms are, indeed, possible for this purpose. The requisite variety of experiments doesn’t grow exponentially with the variety of items within the catalog or database that’s under review.
“This paper provides an important breakthrough,” comments Emma Frejinger, a pc scientist on the University of Montreal. “It mathematically proves why traditional data collection fails and demonstrates that simply asking users for his or her best-of-three [choices] unlocks the flexibility to accurately train these powerful models. This finding provides a highly practical roadmap for collecting higher data to drive more accurate optimizations.”
“Constructing utility models goes to stay a really energetic area,” Daskalakis insists. “Just as RUMs have been critical to the web economy because the late Nineteen Nineties, they’re, and can remain to be, critical to the alignment of AI models going forward.” More importantly, he adds, “RUMs play a central role within the industrial viability and usefulness of huge language models [LLMs].” In the course of the training period, individuals are typically asked to rank the varied candidate outputs of those LLMs, from which the models can gain a greater sense as to the form of text — by way of tone, style, and content — that’s preferred.
On condition that we’re continuously “besieged with an unlimited sea of options in so many alternative domains,” Daskalakis says, “you can not possibly ask people to speak all their personal preferences for all possible scenarios. So what you may do as a substitute is construct a model that predicts what people take into consideration the various possible outcomes. And you’ve to maintain improving and updating your model in an iterative process until, hopefully, you may make good predictions.”

