QUICK LEARNERS: New system allows robots to drive by watching humans


Hall of Fame Member
Oct 26, 2009
QUICK LEARNERS: New system allows robots to drive by watching humans
Spiro Papuckoski
Dec 27, 2020 • Last Updated 1 day ago • 2 minute read
A 3D illustration of robotic hands on a steering wheel while driving an autonomous car. Photo by Iaremenko /iStock / Getty Images
Imagine robots that can learn how to do everyday human tasks, no matter how complex, just by watching you perform them a few times.
That future may not be far away.
Researchers at the University of Southern California have designed a system that teaches robots complicated tasks from a very small number of human demonstrations.
The researchers presented a paper, Learning from Demonstrations Using Signal Temporal Logic, at the Conference on Robot Learning in November.
Signal temporal logic is a mathematical language that enables robots to analyze current and future outcomes.
This new method allows robots to learn from only a handful of demonstrations. Current state-of-the-art procedures require people to repeat a task at least 100 times for a robot to perfect it.
The system allows robots to learn the same way people learn — watch a person do a task and then perform it repeatedly.
“Many machine learning and reinforcement learning systems require large amounts of data and hundreds of demonstrations — you need a human to demonstrate over and over again, which is not feasible,” said lead author Aniruddh Puranic, a Ph.D. student in computer science at the USC Viterbi School of Engineering.
“Also, most people don’t have programming knowledge to explicitly state what the robot needs to do, and a human cannot possibly demonstrate everything that a robot needs to know. What if the robot encounters something it hasn’t seen before? This is a key challenge.”
Learning from only a few demonstrations raises real world concerns, as the robot may learn unsafe patterns or unwanted actions.
To limit these issues, researchers employ signal temporal logic so that robots can evaluate the quality of each human demonstration.
For example, if a person driving a vehicle doesn’t fully stop at a stop sign, that behaviour would be ranked lower by the robot when compared to a good driver who stops at every stop sign.
But if a driver brakes quickly to avoid a collision, the robot would learn from that smart behaviour.
“If we want robots to be good teammates and help people, first they need to learn and adapt to human preference very efficiently,” said the paper’s co-author Stefanos Nikolaidis.
Researchers have yet to try out this method on real robots. The system has only been tested on a video game simulator, but could be expanded in the future to include driving simulations.