“Robot, make me a chair” | MIT News

Computer-aided design (CAD) systems are tried-and-true tools used to design lots of the physical objects we use every day. But CAD software requires extensive expertise to master, and plenty of tools incorporate such a high level of detail they don’t lend themselves to brainstorming or rapid prototyping.

In an effort to make design faster and more accessible for non-experts, researchers from MIT and elsewhere developed an AI-driven robotic assembly system that permits people to construct physical objects by simply describing them in words.

Their system uses a generative AI model to construct a 3D representation of an object’s geometry based on the user’s prompt. Then, a second generative AI model reasons concerning the desired object and figures out where different components should go, in response to the item’s function and geometry.

The system can robotically construct the item from a set of prefabricated parts using robotic assembly. It might also iterate on the design based on feedback from the user.

The researchers used this end-to-end system to fabricate furniture, including chairs and shelves, from two varieties of premade components. The components might be disassembled and reassembled at will, reducing the quantity of waste generated through the fabrication process.

They evaluated these designs through a user study and located that greater than 90 percent of participants preferred the objects made by their AI-driven system, as in comparison with different approaches.

While this work is an initial demonstration, the framework could possibly be especially useful for rapid prototyping complex objects like aerospace components and architectural objects. In the long run, it could possibly be utilized in homes to fabricate furniture or other objects locally, without the necessity to have bulky products shipped from a central facility.

“In the end, we would like to find a way to speak and refer to a robot and AI system the identical way we refer to one another to make things together. Our system is a primary step toward enabling that future,” says lead creator Alex Kyaw, a graduate student within the MIT departments of Electrical Engineering and Computer Science (EECS) and Architecture.

Kyaw is joined on the paper by Richa Gupta, an MIT architecture graduate student; Faez Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computation Group within the Department of Architecture; senior creator Randall Davis, an EECS professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); in addition to others at Google Deepmind and Autodesk Research. The paper was recently presented on the Conference on Neural Information Processing Systems.

Generating a multicomponent design

While generative AI models are good at generating 3D representations, often known as meshes,  from text prompts, most don’t produce uniform representations of an object’s geometry which have the component-level details needed for robotic assembly.

Separating these meshes into components is difficult for a model because assigning components will depend on the geometry and functionality of the item and its parts.

The researchers tackled these challenges using a vision-language model (VLM), a strong generative AI model that has been pre-trained to know images and text. They task the VLM with determining how two varieties of prefabricated parts, structural components and panel components, should fit together to form an object.

“There are numerous ways we are able to put panels on a physical object, however the robot must see the geometry and reason over that geometry to make a choice about it. By serving as each the eyes and brain of the robot, the VLM enables the robot to do that,” Kyaw says.

A user prompts the system with text, perhaps by typing “make me a chair,” and provides it an AI-generated image of a chair to begin.

Then, the VLM reasons concerning the chair and determines where panel components go on top of structural components, based on the functionality of many example objects it has seen before. As an example, the model can determine that the seat and backrest must have panels to have surfaces for somebody sitting and leaning on the chair.

It outputs this information as text, similar to “seat” or “backrest.” Each surface of the chair is then labeled with numbers, and the data is fed back to the VLM.

Then the VLM chooses the labels that correspond to the geometric parts of the chair that ought to receive panels on the 3D mesh to finish the design.

Human-AI co-design

The user stays within the loop throughout this process and may refine the design by giving the model a brand new prompt, similar to “only use panels on the backrest, not the seat.”

“The design space could be very big, so we narrow it down through user feedback. We imagine that is the perfect solution to do it because people have different preferences, and constructing an idealized model for everybody could be unattainable,” Kyaw says.

“The human‑in‑the‑loop process allows the users to steer the AI‑generated designs and have a way of ownership within the end result,” adds Gupta.

Once the 3D mesh is finalized, a robotic assembly system builds the item using prefabricated parts. These reusable parts might be disassembled and reassembled into different configurations.

The researchers compared the outcomes of their method with an algorithm that places panels on all horizontal surfaces which might be facing up, and an algorithm that places panels randomly. In a user study, greater than 90 percent of people preferred the designs made by their system.

In addition they asked the VLM to elucidate why it selected to place panels in those areas.

“We learned that the vision language model is in a position to understand a point of the functional features of a chair, like leaning and sitting, to know why it’s placing panels on the seat and backrest. It isn’t just randomly spitting out these assignments,” Kyaw says.

In the long run, the researchers want to reinforce their system to handle more complex and nuanced user prompts, similar to a table made out of glass and metal. As well as, they need to include additional prefabricated components, similar to gears, hinges, or other moving parts, so objects could have more functionality.

“Our hope is to drastically lower the barrier of access to design tools. We now have shown that we are able to use generative AI and robotics to show ideas into physical objects in a quick, accessible, and sustainable manner,” says Davis.

Related Post

Leave a Reply