Hidden geometry of learning: Neural networks think alike

Penn Engineers have uncovered an unexpected pattern in how neural networks — the systems leading today’s AI revolution — learn, suggesting a solution to some of the essential unanswered questions in AI: why these methods work so well.

Inspired by biological neurons, neural networks are computer programs that soak up data and train themselves by repeatedly making small modifications to the weights or parameters that govern their output, very similar to neurons adjusting their connections to 1 one other. The end result is a model that enables the network to predict on data it has not seen before. Neural networks are getting used today in essentially all fields of science and engineering, from medicine to cosmology, identifying potentially diseased cells and discovering recent galaxies.

In a brand new paper published within the Proceedings of the National Academy of Sciences (PNAS), Pratik Chaudhari, Assistant Professor in Electrical and Systems Engineering (ESE) and core faculty on the General Robotics, Automation, Sensing and Perception (GRASP) Lab, and co-author James Sethna, James Gilbert White Professor of Physical Sciences at Cornell University, show that neural networks, regardless of their design, size or training recipe, follow the identical route from ignorance to truth when presented with images to categorise.

Jialin Mao, a doctoral student in Applied Mathematics and Computational Science on the University of Pennsylvania School of Arts & Sciences, is the paper’s lead creator.

“Suppose the duty is to discover pictures of cats and dogs,” says Chaudhari. “You may use the whiskers to categorise them, while one other person might use the form of the ears — you’ll presume that different networks would use the pixels in the photographs in alternative ways, and a few networks definitely achieve higher results than others, but there may be a really strong commonality in how all of them learn. That is what makes the result so surprising.”

The result not only illuminates the inner workings of neural networks, but gestures toward the opportunity of developing hyper-efficient algorithms that might classify images in a fraction of the time, at a fraction of the fee. Indeed, one among the best costs related to AI is the immense computational power required to develop neural networks. “These results suggest that there may exist recent ways to coach them,” says Chaudhari.

For instance the potential of this recent method, Chaudhari suggests imagining the networks as attempting to chart a course on a map. “Allow us to imagine two points,” he says. “Ignorance, where the network doesn’t know anything concerning the correct labels, and Truth, where it might appropriately classify all images. Training a network corresponds to charting a path between Ignorance and Truth in probability space — in billions of dimensions. Nevertheless it seems that different networks take the identical path, and this path is more like three-, four-, or five-dimensional.”

In other words, despite the staggering complexity of neural networks, classifying images — one among the foundational tasks for AI systems — requires only a small fraction of that complexity. “This is definitely evidence that the small print of the network design, size or training recipes matter lower than we predict,” says Chaudhari.

To reach at these insights, Chaudhari and Sethna borrowed tools from information geometry, a field that brings together geometry and statistics. By treating each network as a distribution of probabilities, the researchers were capable of make a real apples-to-apples comparison among the many networks, revealing their unexpected, underlying similarities. “Due to the peculiarities of high-dimensional spaces, all points are distant from each other,” says Chaudhari. “We developed more sophisticated tools that give us a cleaner picture of the networks’ differences.”

Using a wide selection of techniques, the team trained a whole bunch of 1000’s of networks, of many alternative varieties, including multi-layer perceptrons, convolutional and residual networks, and the transformers which are at the center of systems like ChatGPT. “Then, this beautiful picture emerged,” says Chaudhari. “The output probabilities of those networks were neatly clustered together on these thin manifolds in gigantic spaces.” In other words, the paths that represented the networks’ learning aligned with each other, showing that they learned to categorise images the identical way.

Chaudhari offers two potential explanations for this surprising phenomenon: first, neural networks are never trained on random assortments of pixels. “Imagine salt and pepper noise,” says Chaudhari. “That’s clearly a picture, but not a really interesting one — images of actual objects like people and animals are a tiny, tiny subset of the space of all possible images.” Put in another way, asking a neural network to categorise images that matter to humans is less complicated than it seems, because there are a lot of possible images the network never has to contemplate.

Second, the labels neural networks use are somewhat special. Humans group objects into broad categories, like dogs and cats, and shouldn’t have separate words for each particular member of each breed of animals. “If the networks had to make use of all of the pixels to make predictions,” says Chaudhari, “then the networks would have found out many, many alternative ways.” However the features that distinguish, say, cats and dogs are themselves low-dimensional. “We consider these networks are finding the identical relevant features,” adds Chaudhari, likely by identifying commonalities like ears, eyes, markings and so forth.

Discovering an algorithm that can consistently find the trail needed to coach a neural network to categorise images using only a handful of inputs is an unresolved challenge. “That is the billion-dollar query,” says Chaudhari. “Can we train neural networks cheaply? This paper gives evidence that we’d find a way to. We just don’t know the way.”

This study was conducted on the University of Pennsylvania School of Engineering and Applied Science and Cornell University. It was supported by grants from the National Science Foundation, National Institutes of Health, the Office of Naval Research, Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship and cloud computing credits from Amazon Web Services.

Other co-authors include Rahul Ramesh at Penn Engineering; Rubing Yang on the University of Pennsylvania School of Arts & Sciences; Itay Griniasty and Han Kheng Teoh at Cornell University; and Mark K. Transtrum at Brigham Young University.