If someone advises you to “know your limits,” they’re likely suggesting you do things like exercise sparsely. To a robot, though, the motto represents learning constraints, or limitations of a particular task inside the machine’s environment, to do chores safely and appropriately.
As an illustration, imagine asking a robot to scrub your kitchen when it doesn’t understand the physics of its surroundings. How can the machine generate a practical multistep plan to make sure the room is spotless? Large language models (LLMs) can get them close, but when the model is barely trained on text, it’s more likely to miss out on key specifics concerning the robot’s physical constraints, like how far it might probably reach or whether there are nearby obstacles to avoid. Stick with LLMs alone, and also you’re more likely to find yourself cleansing pasta stains out of your floorboards.
To guide robots in executing these open-ended tasks, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) used vision models to see what’s near the machine and model its constraints. The team’s strategy involves an LLM sketching up a plan that’s checked in a simulator to make sure it’s secure and realistic. If that sequence of actions is infeasible, the language model will generate a brand new plan, until it arrives at one which the robot can execute.
This trial-and-error method, which the researchers call “Planning for Robots via Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-horizon plans to make sure they satisfy all constraints, and enables a robot to perform such diverse tasks as writing individual letters, drawing a star, and sorting and placing blocks in several positions. In the longer term, PRoC3S could help robots complete more intricate chores in dynamic environments like houses, where they could be prompted to do a general chore composed of many steps (like “make me breakfast”).
“LLMs and classical robotics systems like task and motion planners can’t execute these sorts of tasks on their very own, but together, their synergy makes open-ended problem-solving possible,” says PhD student Nishanth Kumar SM ’24, co-lead creator of a brand new paper about PRoC3S. “We’re making a simulation on-the-fly of what’s across the robot and trying out many possible motion plans. Vision models help us create a really realistic digital world that allows the robot to reason about feasible actions for every step of a long-horizon plan.”
The team’s work was presented this past month in a paper shown on the Conference on Robot Learning (CoRL) in Munich, Germany.
Teaching a robot its limits for open-ended chores
MIT CSAIL
The researchers’ method uses an LLM pre-trained on text from across the web. Before asking PRoC3S to do a task, the team provided their language model with a sample task (like drawing a square) that’s related to the goal one (drawing a star). The sample task includes an outline of the activity, a long-horizon plan, and relevant details concerning the robot’s environment.
But how did these plans fare in practice? In simulations, PRoC3S successfully drew stars and letters eight out of 10 times each. It also could stack digital blocks in pyramids and features, and place items with accuracy, like fruits on a plate. Across each of those digital demos, the CSAIL method accomplished the requested task more consistently than comparable approaches like “LLM3” and “Code as Policies”.
The CSAIL engineers next brought their approach to the actual world. Their method developed and executed plans on a robotic arm, teaching it to place blocks in straight lines. PRoC3S also enabled the machine to position blue and red blocks into matching bowls and move all objects near the middle of a table.
Kumar and co-lead creator Aidan Curtis SM ’23, who’s also a PhD student working in CSAIL, say these findings indicate how an LLM can develop safer plans that humans can trust to work in practice. The researchers envision a house robot that might be given a more general request (like “bring me some chips”) and reliably determine the particular steps needed to execute it. PRoC3S could help a robot test out plans in the same digital environment to search out a working plan of action — and more importantly, bring you a tasty snack.
For future work, the researchers aim to enhance results using a more advanced physics simulator and to expand to more elaborate longer-horizon tasks via more scalable data-search techniques. Furthermore, they plan to use PRoC3S to mobile robots corresponding to a quadruped for tasks that include walking and scanning surroundings.
“Using foundation models like ChatGPT to regulate robot actions can result in unsafe or incorrect behaviors on account of hallucinations,” says The AI Institute researcher Eric Rosen, who isn’t involved within the research. “PRoC3S tackles this issue by leveraging foundation models for high-level task guidance, while employing AI techniques that explicitly reason concerning the world to make sure verifiably secure and proper actions. This mix of planning-based and data-driven approaches could also be key to developing robots able to understanding and reliably performing a broader range of tasks than currently possible.”
Kumar and Curtis’ co-authors are also CSAIL affiliates: MIT undergraduate researcher Jing Cao and MIT Department of Electrical Engineering and Computer Science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Their work was supported, partially, by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, MIT Quest for Intelligence, and The AI Institute.