Training a Robot using Positive and Negative Reinforcement
In this project we have developed a system to allow ordinary, non-technical users to train a humanoid robot to perform a variety of interactive social behaviors via the process of “interactive shaping.” Behaviors learned during this project included: following, maintaining conversational distance, and looking away.
The training system uses the TAMER framework, developed by Brad Knox during his PhD at UT Austin. Under this framework, the human operator gives positive and negative feedback to the agent in real-time. The agent gradually learns based on the frequency of the feedback and the current state of the world what behaviors it should execute. In this example, Nexi uses the distance and angle measurements from a special marker, combined with the human feedback, to learn whether to turn left, right, go forward, or stay still. By learning when to execute each action, Nexi can gradually build up a representation of more complex social behaviors.
- Knox, W. Bradley, Peter Stone, and Cynthia Breazeal. “Training a robot via human feedback: A case study.” in Proceedings of the International Conference on Social Robotics (ICSR). October, 2013. *Best Paper Award*