Google’s DeepMind Training Robotics With Video and Lang Models

Google DeepMind training robots

In 2024, Google’s DeepMind Robotics researchers are among many teams who are exploring the potential of generative AI/large foundational models and robotics for various applications, such as learning and product design. There is a great deal of anticipation surrounding the possibilities of training robotics with DeepMind.

Today, the team is emphasizing their research on giving training robotics a better understanding of what humans expect from them. Instead of just repeating the same task over and over, robots need to be able to recognize and react to changes in their environment or mission parameters. This kind of adaptability would allow robots to be used in more dynamic situations, such as those encountered in a factory, a hospital, or even a home.

The DeepMind team designed AutoRT to harness large foundational models for several different ends. As an example, the system uses a Visual Language Model (VLM) for improved situational awareness. AutoRT also enables a fleet of robots to work in tandem and use cameras to map out their environment and identify objects.

The hardware can accomplish tasks suggested by a large language model (LLM), which is widely believed to be the key to enabling robots to understand more natural language commands, eliminating the need for hard coding skills. AutoRT has been extensively tested over the past seven months and can manage up to 20 robots and 52 devices simultaneously.

DeepMind has conducted 77,000 trials and completed over 6,000 tasks. Additionally, they have developed RT-Trajectory which uses video input to teach robots. Many teams are using YouTube videos to train robots on a large scale, but RT-Trajectory adds a two-dimensional sketch of the arm in action over the video.

We note that the trajectories, represented as RGB images, provide practical visual cues to the model as it learns the robot-control policies. DeepMind reports that their RT-Trajectory training had double the success rate of RT-2 training, achieving 63% success on 41 tasks compared to 29%. They emphasize that RT-Trajectory takes advantage of the abundant data from robotic motion that is currently not being utilized.

RT-Trajectory takes another step in the journey to construct robots that can move with efficient accuracy in new scenarios, while also unlocking knowledge from existing datasets.