At the highest of many automation wish lists is a very time-consuming task: chores.
The moonshot of many roboticists is cooking up the right hardware and software combination in order that a machine can learn “generalist” policies (the principles and methods that guide robot behavior) that work in all places, under all conditions. Realistically, though, if you have got a house robot, you almost certainly don’t care much about it working in your neighbors. MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers decided, with that in mind, to try to seek out an answer to simply train robust robot policies for very specific environments.
“We aim for robots to perform exceptionally well under disturbances, distractions, various lighting conditions, and changes in object poses, all inside a single environment,” says Marcel Torne Villasevil, MIT CSAIL research assistant within the Improbable AI lab and lead creator on a recent paper concerning the work. “We propose a technique to create digital twins on the fly using the most recent advances in computer vision. With just their phones, anyone can capture a digital replica of the actual world, and the robots can train in a simulated environment much faster than the actual world, because of GPU parallelization. Our approach eliminates the necessity for extensive reward engineering by leveraging a number of real-world demonstrations to jump-start the training process.”
Taking your robot home
RialTo, in fact, is slightly more complicated than simply a straightforward wave of a phone and (boom!) home bot at your service. It begins by utilizing your device to scan the goal environment using tools like NeRFStudio, ARCode, or Polycam. Once the scene is reconstructed, users can upload it to RialTo’s interface to make detailed adjustments, add obligatory joints to the robots, and more.
The refined scene is exported and brought into the simulator. Here, the aim is to develop a policy based on real-world actions and observations, similar to one for grabbing a cup on a counter. These real-world demonstrations are replicated within the simulation, providing some invaluable data for reinforcement learning. “This helps in creating a powerful policy that works well in each the simulation and the actual world. An enhanced algorithm using reinforcement learning helps guide this process, to make sure the policy is effective when applied outside of the simulator,” says Torne.
Testing showed that RialTo created strong policies for a wide range of tasks, whether in controlled lab settings or more unpredictable real-world environments, improving 67 percent over imitation learning with the identical variety of demonstrations. The tasks involved opening a toaster, placing a book on a shelf, putting a plate on a rack, placing a mug on a shelf, opening a drawer, and opening a cupboard. For every task, the researchers tested the system’s performance under three increasing levels of difficulty: randomizing object poses, adding visual distractors, and applying physical disturbances during task executions. When paired with real-world data, the system outperformed traditional imitation-learning methods, especially in situations with plenty of visual distractions or physical disruptions.
“These experiments show that if we care about being very robust to 1 particular environment, the perfect idea is to leverage digital twins as a substitute of attempting to obtain robustness with large-scale data collection in diverse environments,” says Pulkit Agrawal, director of Improbable AI Lab, MIT electrical engineering and computer science (EECS) associate professor, MIT CSAIL principal investigator, and senior creator on the work.
So far as limitations, RialTo currently takes three days to be fully trained. To hurry this up, the team mentions improving the underlying algorithms and using foundation models. Training in simulation also has its limitations, and currently it’s difficult to do effortless sim-to-real transfer and simulate deformable objects or liquids.
The subsequent level
So what’s next for RialTo’s journey? Constructing on previous efforts, the scientists are working on preserving robustness against various disturbances while improving the model’s adaptability to recent environments. “Our next endeavor is that this approach to using pre-trained models, accelerating the educational process, minimizing human input, and achieving broader generalization capabilities,” says Torne.
“We’re incredibly captivated with our ‘on-the-fly’ robot programming concept, where robots can autonomously scan their environment and learn the right way to solve specific tasks in simulation. While our current method has limitations — similar to requiring a number of initial demonstrations by a human and significant compute time for training these policies (up to a few days) — we see it as a big step towards achieving ‘on-the-fly’ robot learning and deployment,” says Torne. “This approach moves us closer to a future where robots won’t need a preexisting policy that covers every scenario. As a substitute, they will rapidly learn recent tasks without extensive real-world interaction. In my opinion, this advancement could expedite the sensible application of robotics far earlier than relying solely on a universal, all-encompassing policy.”
“To deploy robots in the actual world, researchers have traditionally relied on methods similar to imitation learning from expert data, which could be expensive, or reinforcement learning, which could be unsafe,” says Zoey Chen, a pc science PhD student on the University of Washington who wasn’t involved within the paper. “RialTo directly addresses each the security constraints of real-world RL [robot learning], and efficient data constraints for data-driven learning methods, with its novel real-to-sim-to-real pipeline. This novel pipeline not only ensures protected and robust training in simulation before real-world deployment, but additionally significantly improves the efficiency of knowledge collection. RialTo has the potential to significantly scale up robot learning and allows robots to adapt to complex real-world scenarios far more effectively.”
“Simulation has shown impressive capabilities on real robots by providing inexpensive, possibly infinite data for policy learning,” adds Marius Memmel, a pc science PhD student on the University of Washington who wasn’t involved within the work. “Nonetheless, these methods are limited to a number of specific scenarios, and constructing the corresponding simulations is pricey and laborious. RialTo provides an easy-to-use tool to reconstruct real-world environments in minutes as a substitute of hours. Moreover, it makes extensive use of collected demonstrations during policy learning, minimizing the burden on the operator and reducing the sim2real gap. RialTo demonstrates robustness to object poses and disturbances, showing incredible real-world performance without requiring extensive simulator construction and data collection.”
Torne wrote this paper alongside senior authors Abhishek Gupta, assistant professor on the University of Washington, and Agrawal. 4 other CSAIL members are also credited: EECS PhD student Anthony Simeonov SM ’22, research assistant Zechu Li, undergraduate student April Chan, and Tao Chen PhD ’24. Improbable AI Lab and WEIRD Lab members also contributed invaluable feedback and support in developing this project.
This work was supported, partly, by the Sony Research Award, the U.S. government, and Hyundai Motor Co., with assistance from the WEIRD (Washington Embodied Intelligence and Robotics Development) Lab. The researchers presented their work on the Robotics Science and Systems (RSS) conference earlier this month.