Sophie’s Kitchen is an interactive video game that takes place in a virtual house. The goal is to help the virtual agent (Sophie) bake a cake, by providing her with guidance through reward. Users can connect online to play with Sophie in a training session. Sophie can turn left and right, pick up, put down, and use items in the kitchen. A slider on the left allows the human to send “rewards” (a scalar from [-1, 1]) to Sophie. Traditional Reinforcement Learning algorithms make certain assumptions about the nature and meaning of the reward signal…but do those assumptions match up to reality when the reward comes from a human?
18 participants came to the lab to play with Sophie. Participants were told that they could not directly communicate with Sophie, but could instead give Sophie periodic feedback about how she was doing. All but one participant managed to successfully teach Sophie the task, but the specific strategies they used greatly influenced how long it took. Major insights into how humans tried to teach Sophie can be found in the next section.
Unlike traditional RL reward functions, we found that humans seem to generate reward as both feedback (i.e. for past actions) and guidance (to encourage future actions that seem likely to happen). Humans also exhibit positive bias in rewards – aggregated across all situations, they are more likely to give positive, rather than negative, feedback to the agent. Finally, we found that as humans refine their mental models of the agent, their reward strategy shifts accordingly – thus, to elicit the best possible training from the human, an agent should make its mental model transparent (e.g., by using gaze behavior to indicate future intended actions). Overall, we were able to show that designing an agent that takes advantage of these human predispositions is able to learn more quickly and effectively. Sophie’s Kitchen set the stage for future research into Socially Guided Machine Learning, including investigations of multiple channels for specialized feedback.
- Thomaz, Andrea L., and Cynthia Breazeal. “Teachable robots: Understanding human teaching behavior to build more effective robot learners.” Artificial Intelligence 172.6 (2008): 716-737.
- Thomaz, Andrea Lockerd, and Cynthia Breazeal. “Asymmetric interpretations of positive and negative human feedback for a social learning agent.” The 16th IEEE International Symposium on Robot and Human interactive Communication (RO-MAN 2007), IEEE.
- Thomaz, Andrea Lockerd, and Cynthia Breazeal. “Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance.” AAAI. Vol. 6. 2006.
- A. L. Thomaz, and C. Breazeal (2006). “Transparency and Socially Guided Machine Learning.” In Proceedings of the 5th International Conference on Developmental Learning (ICDL ’06)