Guided learning lets “untrainable” neural networks realize their potential | MIT News

Even networks long considered “untrainable” can learn effectively with a little bit of a helping hand. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown that a temporary period of alignment between neural networks, a technique they call guidance, can dramatically improve the performance of architectures previously thought unsuitable for contemporary tasks.

Their findings suggest that many so-called “ineffective” networks may simply start from less-than-ideal starting points, and that short-term guidance can place them in a spot that makes learning easier for the network. 

The team’s guidance method works by encouraging a goal network to match the interior representations of a guide network during training. Unlike traditional methods like knowledge distillation, which give attention to mimicking a teacher’s outputs, guidance transfers structural knowledge directly from one network to a different. This implies the goal learns how the guide organizes information inside each layer, somewhat than simply copying its behavior. Remarkably, even untrained networks contain architectural biases that might be transferred, while trained guides moreover convey learned patterns. 

“We found these results pretty surprising,” says Vighnesh Subramaniam ’23, MEng ’24, MIT Department of Electrical Engineering and Computer Science (EECS) PhD student and CSAIL researcher, who’s a lead creator on a paper presenting these findings. “It’s impressive that we could use representational similarity to make these traditionally ‘crappy’ networks actually work.”

Guide-ian angel 

A central query was whether guidance must proceed throughout training, or if its primary effect is to supply a greater initialization. To explore this, the researchers performed an experiment with deep fully connected networks (FCNs). Before training on the true problem, the network spent just a few steps practicing with one other network using random noise, like stretching before exercise. The outcomes were striking: Networks that typically overfit immediately remained stable, achieved lower training loss, and avoided the classic performance degradation seen in something called standard FCNs. This alignment acted like a helpful warmup for the network, showing that even a brief rehearsal can have lasting advantages while not having constant guidance.

The study also compared guidance to knowledge distillation, a well-liked approach wherein a student network attempts to mimic a teacher’s outputs. When the teacher network was untrained, distillation failed completely, because the outputs contained no meaningful signal. Guidance, in contrast, still produced strong improvements since it leverages internal representations somewhat than final predictions. This result underscores a key insight: Untrained networks already encode helpful architectural biases that may steer other networks toward effective learning.

Beyond the experimental results, the findings have broad implications for understanding neural network architecture. The researchers suggest that success — or failure — often depends less on task-specific data, and more on the network’s position in parameter space. By aligning with a guide network, it’s possible to separate the contributions of architectural biases from those of learned knowledge. This enables scientists to discover which features of a network’s design support effective learning, and which challenges stem simply from poor initialization.

Guidance also opens recent avenues for studying relationships between architectures. By measuring how easily one network can guide one other, researchers can probe distances between functional designs and reexamine theories of neural network optimization. Because the method relies on representational similarity, it could reveal previously hidden structures in network design, helping to discover which components contribute most to learning and which don’t.

Salvaging the hopeless

Ultimately, the work shows that so-called “untrainable” networks usually are not inherently doomed. With guidance, failure modes might be eliminated, overfitting avoided, and previously ineffective architectures brought into line with modern performance standards. The CSAIL team plans to explore which architectural elements are most answerable for these improvements and the way these insights can influence future network design. By revealing the hidden potential of even essentially the most stubborn networks, guidance provides a robust recent tool for understanding — and hopefully shaping — the foundations of machine learning.

“It’s generally assumed that different neural network architectures have particular strengths and weaknesses,” says Leyla Isik, Johns Hopkins University assistant professor of cognitive science, who wasn’t involved within the research. “This exciting research shows that one form of network can inherit the benefits of one other architecture, without losing its original capabilities. Remarkably, the authors show this might be done using small, untrained ‘guide’ networks. This paper introduces a novel and concrete strategy to add different inductive biases into neural networks, which is critical for developing more efficient and human-aligned AI.”

Subramaniam wrote the paper with CSAIL colleagues: Research Scientist Brian Cheung; PhD student David Mayo ’18, MEng ’19; Research Associate Colin Conwell; principal investigators Boris Katz, a CSAIL principal research scientist, and Tomaso Poggio, an MIT professor in brain and cognitive sciences; and former CSAIL research scientist Andrei Barbu. Their work was supported, partly, by the Center for Brains, Minds, and Machines, the National Science Foundation, the MIT CSAIL Machine Learning Applications Initiative, the MIT-IBM Watson AI Lab, the U.S. Defense Advanced Research Projects Agency (DARPA), the U.S. Department of the Air Force Artificial Intelligence Accelerator, and the U.S. Air Force Office of Scientific Research.

Their work was recently presented on the Conference and Workshop on Neural Information Processing Systems (NeurIPS).

Related Post

Leave a Reply