NexiStateActionSpaceIconicReward instances (+1 or -1 in value) are visualized in the bottom-left box labeled “Incoming Reward”. The input features for the TAMER-learned reward model are the action and  the distance and angle to the Vicon-tracked marker. The predictive model of human reward learned is shown in the three squares at the bottom right, from a birds-eye perspective of the robotic agent and the marker. The robot Nexi is facing upward, and the marker is shown as a white triangle.


Many thanks to Stefan Grabowski and Paula Aguilera for their help in editing the videos above.


Below are less informationally rich videos of training sessions, covering each of the five behaviors. They are presented in chronological order.
Go To (success)
Magnetic Control (first failure)
Magnetic Control (second failure)
Magnetic Control (third failure)
Magnetic Control (fourth attempt, stopped early for debugging)
Keep Conversational Distance (success)
Look Away (success)
Magnetic Control (success)
Toy Tantrum (success)