In today’s hospitals and clinics, a dermatologist may use a synthetic intelligence model for classifying skin lesions to evaluate if the lesion is susceptible to developing right into a cancer or whether it is benign. But when the model is biased toward certain skin tones, it could fail to discover a high-risk patient.
Perhaps the most effective known and most persistent challenges that AI research continues to reckon with is bias. Bias is commonly discussed in relation to training data, but model architecture also can contain and amplify bias, negatively influencing model performance in real-world settings. In high-stakes medical scenarios, the very real consequences of poor performance have made bias right into a quintessential safety issue.
A brand new paper from researchers at MIT, Worcester Polytechnic Institute, and Google that was accepted to the 2026 International Conference for Learning Representations proposes a novel debiasing approach called “Weighted Rotational DebiasING” (i.e., WRING) that may be applied to vision language models (VLMs), like OpenAI’s OpenCLIP.
VLMs are multi-modal models that may understand and interpret different data modalities like video, image, and text concurrently. While debiasing approaches for VLMs do exist, essentially the most commonly used approach is generally known as “projection debiasing,” which ends up in what has been termed the “Whac-A-Mole dilemma”, an empirical remark that was formally introduced to AI research in 2023.
Projection debiasing is a post-processing approach that removes the undesirable, biased information from model embeddings by “projecting” the subspace out of a representation space of relationships, thereby cutting out the bias. But this approach has its drawbacks.
“While you do this, you inadvertently squish the whole lot around,” says Walter Gerych, the paper’s first creator, who conducted this research last yr as a postdoc at MIT. “All the opposite relationships that the model learns change once you do this.”
Gerych, who’s now an assistant professor of computer science at Worcester Polytechnic Institute, is joined on the paper by MIT graduate students Cassandra Parent and Quinn Perian; Google’s Rafiya Javed; and MIT associate professors of electrical engineering Justin Solomon and Marzyeh Ghassemi, who’s an affiliate of the Abdul Latif Jameel Clinic for Machine Learning and Health and the Laboratory for Information and Decision Systems.
While projection debiasing stops the model from acting upon the bias that’s been projected out of the subspace, it might probably find yourself amplifying and creating other biases, hence the Whac-A-Mole dilemma. In accordance with Ghassemi, the unintended amplification of model biases is “each a technical and practical challenge. As an example, when debiasing a VLM that retrieves images of clinical staff — if racial bias is removed — it could have the unintended consequence of amplifying gender bias.”
WRING works by moving certain coordinates throughout the high-dimensional space of a model — those that look like liable for bias — to a distinct angle, so the model can now not distinguish between different groups inside a certain concept. This changes the representation inside a selected space while leaving the model’s other relationships intact. And like projection debiasing, WRING is a post-processing approach, which implies it might probably be applied “on the fly” to a pre-trained VLM.
“People already spent a number of resources, a number of money, training these huge models, and we don’t actually need to go in and modify something during training because then you have got to start out from scratch,” Gerych explains. “[WRING is] very efficient. It doesn’t require more training of the model and it’s minimally invasive.”
Of their results, the researchers found that WRING significantly reduced bias for a goal concept without increasing bias in other areas. But for now, the approach is somewhat limited to Contrastive Language-Image Pre-training (CLIP) models, a style of VLM that connects images to language for search or classification.
“Extending this for ChatGPT-style, generative language models, is the reasonable next step for us,” says Gerych.
This work was supported, partly, by a National Science Foundation CAREER Award, AI2050 Award Early Profession Fellowship, Sloan Research Fellow Award, the Gordon and Betty Moore Foundation Award, and MIT-Google Computing Innovation Award.

