Peripheral vision enables humans to see shapes that aren’t directly in our line of sight, albeit with less detail. This ability expands our visual field and will be helpful in lots of situations, resembling detecting a vehicle approaching our automobile from the side.
Unlike humans, AI doesn’t have peripheral vision. Equipping computer vision models with this ability could help them detect approaching hazards more effectively or predict whether a human driver would notice an oncoming object.
Taking a step on this direction, MIT researchers developed a picture dataset that permits them to simulate peripheral vision in machine learning models. They found that training models with this dataset improved the models’ ability to detect objects within the visual periphery, although the models still performed worse than humans.
Their results also revealed that, unlike with humans, neither the dimensions of objects nor the quantity of visual clutter in a scene had a robust impact on the AI’s performance.
“There’s something fundamental happening here. We tested so many alternative models, and even once we train them, they get somewhat bit higher but they usually are not quite like humans. So, the query is: What’s missing in these models?” says Vasha DuTell, a postdoc and co-author of a paper detailing this study.
Answering that query may help researchers construct machine learning models that may see the world more like humans do. Along with improving driver safety, such models might be used to develop displays which are easier for people to view.
Plus, a deeper understanding of peripheral vision in AI models could help researchers higher predict human behavior, adds lead creator Anne Harrington MEng ’23.
“Modeling peripheral vision, if we are able to really capture the essence of what’s represented within the periphery, can assist us understand the features in a visible scene that make our eyes move to gather more information,” she explains.
Their co-authors include Mark Hamilton, an electrical engineering and computer science graduate student; Ayush Tewari, a postdoc; Simon Stent, research manager on the Toyota Research Institute; and senior authors William T. Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ruth Rosenholtz, principal research scientist within the Department of Brain and Cognitive Sciences and a member of CSAIL. The research will probably be presented on the International Conference on Learning Representations.
“Any time you could have a human interacting with a machine — a automobile, a robot, a user interface — it’s hugely essential to know what the person can see. Peripheral vision plays a critical role in that understanding,” Rosenholtz says.
Simulating peripheral vision
Extend your arm in front of you and put your thumb up — the small area around your thumbnail is seen by your fovea, the small depression in the midst of your retina that gives the sharpest vision. The whole lot else you’ll be able to see is in your visual periphery. Your visual cortex represents a scene with less detail and reliability because it moves farther from that sharp point of focus.
Many existing approaches to model peripheral vision in AI represent this deteriorating detail by blurring the perimeters of images, but the knowledge loss that happens within the optic nerve and visual cortex is way more complex.
For a more accurate approach, the MIT researchers began with a method used to model peripheral vision in humans. Generally known as the feel tiling model, this method transforms images to represent a human’s visual information loss.
They modified this model so it could transform images similarly, but in a more flexible way that doesn’t require knowing prematurely where the person or AI will point their eyes.
“That allow us faithfully model peripheral vision the identical way it’s being done in human vision research,” says Harrington.
The researchers used this modified technique to generate an enormous dataset of transformed images that appear more textural in certain areas, to represent the lack of detail that happens when a human looks further into the periphery.
Then they used the dataset to coach several computer vision models and compared their performance with that of humans on an object detection task.
“We needed to be very clever in how we arrange the experiment so we could also test it within the machine learning models. We didn’t wish to must retrain the models on a toy task that they weren’t meant to be doing,” she says.
Peculiar performance
Humans and models were shown pairs of transformed images which were an identical, except that one image had a goal object positioned within the periphery. Then, each participant was asked to choose the image with the goal object.
“One thing that actually surprised us was how good people were at detecting objects of their periphery. We went through at the very least 10 different sets of images that were just too easy. We kept needing to make use of smaller and smaller objects,” Harrington adds.
The researchers found that training models from scratch with their dataset led to the best performance boosts, improving their ability to detect and recognize objects. Positive-tuning a model with their dataset, a process that involves tweaking a pretrained model so it will possibly perform a brand new task, resulted in smaller performance gains.
But in every case, the machines weren’t pretty much as good as humans, and so they were especially bad at detecting objects within the far periphery. Their performance also didn’t follow the identical patterns as humans.
“Which may suggest that the models aren’t using context in the identical way as humans are to do these detection tasks. The strategy of the models is likely to be different,” Harrington says.
The researchers plan to proceed exploring these differences, with a goal of finding a model that may predict human performance within the visual periphery. This might enable AI systems that alert drivers to hazards they won’t see, as an example. Additionally they hope to encourage other researchers to conduct additional computer vision studies with their publicly available dataset.
“This work is vital since it contributes to our understanding that human vision within the periphery mustn’t be considered just impoverished vision as a consequence of limits within the variety of photoreceptors we now have, but reasonably, a representation that’s optimized for us to perform tasks of real-world consequence,” says Justin Gardner, an associate professor within the Department of Psychology at Stanford University who was not involved with this work. “Furthermore, the work shows that neural network models, despite their advancement in recent times, are unable to match human performance on this regard, which should result in more AI research to learn from the neuroscience of human vision. This future research will probably be aided significantly by the database of images provided by the authors to mimic peripheral human vision.”
This work is supported, partially, by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.