Nvidia's robot hand in simulation.
NvidiaThe field of robotics, a classic application of artificial intelligence, has recently been amplified by the very new and fashionable technology of generative AI, programs such as large language models from OpenAI that can interact with natural language statements.
For example, Google's DeepMind unit this year unveiled RT-2, a large language model that can be presented with an image and a command, and then spit out both a plan of action and the coordinates necessary to complete the command.
Also: Why Biden's AI order is hamstrung by unavoidable vagueness
But there is a threshold that generative programs cannot cross: They can handle "high-level" tasks such as planning the route for a robot to a destination, but they cannot handle "low-level" tasks, such as manipulating the joints of a robot for fine motor control.
New work from Nvidia published this month suggests language models may be closer to crossing that divide. A program called Eureka uses language models to set goals that in turn can be used to direct robots at a low level, including inducing them to perform fine-motor tasks such as robot hands manipulating objects.
The Eureka program is just the first in what will probably have to be many efforts to cross the divide because Eureka is operating inside of a computer simulation of robotics; it doesn't yet control a physical robot in the real world.
"Harnessing [large language models] to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem," write lead author Yecheng Jason Ma and colleagues at Nvidia, the University of Pennsylvania, Caltech, and the University of Texas at Austin, in the paper "Eureka: Human-level, reward design via coding large language, models," posted on the arXiv pre-print server this month.
There is also a companion blog post from Nvidia.
Also: How AI reshapes the IT industry will be 'fast and dramatic'
Ma and team's observation agrees with the view of long-time researchers in robotics. According to Sergey Levine, associate professor in the electrical engineering department at the University of California at Berkeley, language models are not a great choice for "the last inch, the part that has to do with the robot actually physically touching things in the world" because such a task "is mostly bereft of semantics."
"It might be possible to fine-tune a language model to also predict grasps, but it's not clear whether that's actually going to help, because, well, what does language tell you about where to place your fingers on the object?" Levine told . "Maybe it tells you a little bit, but perhaps not so much as to actually make a difference."
The Eureka paper tackles the problem indirectly. Instead of making the language model tell the robot simulation what to do, it is used to craft "rewards," goal states toward which the robot can strive. Rewards are well-established as a method in what is called reinforcement learning, a form of machine learning AI that Berkeley's Levine and other roboticists rely on for robot training.
The hypothesis of Ma and team is that a large language model can do a better job of crafting those rewards for reinforcement learning than a human AI programmer.
Also: Generative AI can't find its own errors. Do we need better prompts?
In a process known as reward "evolution," the programmer writes out as a prompt for GPT-4 all the details of the problem, the data about the robotic simulation -- things such as the environmental constraints on what a robot can do -- and the rewards that have already been tried, and asks GPT-4 to improve it. GPT-4 then devises new rewards and iteratively tests the rewards.
Evolution is what the program is named for: "Evolution-driven Universal REward Kit for Agents," or Eureka.
The outline of how Eureka works: Taking in all the human programmer's basic designs for the robot sim, and then crafting lots of rewards and trying them out in an iterative fashion.
NvidiaMa and team put their invention through its paces on lots of simulations of tasks such as making a robot arm open a drawer. Eureka, they relate, "achieves human-level performance on reward design across a diverse suite of 29 open-sourced RL environments that include 10 distinct robot morphologies, including quadruped, quadcopter, biped, manipulator, as well as several dexterous hands."
A gaggle of robot sim tasks for which the Eureka program crafted rewards.
Nvidia"Without any task-specific prompting or reward templates, Eureka autonomously generates rewards that outperform expert human rewards on 83% of the tasks and realizes an average normalized improvement of 52%," they report.
One of the more striking examples of what they've achieved is to get a simulated robot hand to twirl a pen as would a bored student in class. "We consider pen spinning, in which a five-finger hand needs to rapidly rotate a pen in pre-defined spinning configurations for as many cycles as possible," they write. To do so, they combine Eureka with a machine learning approach developed some years ago called "curriculum learning," in which a task is broken down into bite-sized chunks.
Also: Generative AI will far surpass what ChatGPT can do. Here's everything on how the tech advances
"We demonstrate for the first time rapid pen spinning maneuvers on a simulated anthropomorphic Shadow Hand," they relate.
The authors also make a surprising discovery: If they combine both their improved rewards from Eureka with human rewards, the combo performs better on tests than either human or Eureka rewards alone. They surmise that the reason is humans have one part of the puzzle that the Eureka program does not, namely, a knowledge of the state of affairs.
"Human designers are generally knowledgeable about relevant state variables but are less proficient at designing rewards using them," they write. "This makes intuitive sense as identifying relevant state variables that should be included in the reward function involves mostly common sense reasoning, but reward design requires specialized knowledge and experience in RL."
That points toward a possible human-AI partnership akin to GitHub Copilot and other assistant programs: "Together, these results demonstrate Eureka's reward assistant capability, perfectly complementing human designers' knowledge about useful state variables and making up for their less proficiency on how to design rewards using them."