Home > Research > Publications & Outputs > ‘Until it bores me’: Learning Progress Maximiza...

Electronic data

View graph of relations

‘Until it bores me’: Learning Progress Maximization as the Reward Mechanism to solve the Exploration-Exploitation Dilemma in Infants

Research output: Contribution to conference - Without ISBN/ISSN Posterpeer-review

Publication date24/06/2021
<mark>Original language</mark>English
EventDevelopment in Motion Conference 2021: Presented by the Marie Curie MOTION network - Online
Duration: 22/06/202124/06/2021


ConferenceDevelopment in Motion Conference 2021
Abbreviated titleDevMoCon
Internet address


Infants explore the world to learn about it based on their intrinsically motivated curiosity. However, the mechanisms underlying such exploratory behavior are largely unknown. We propose a new theory in which active learners explore randomly until encountering a familiar entity (e.g. a second stimulus from a previously encountered category) because here, learning is suddenly maximized. Such a category will then be exploited as long as the learning progress is above an individually varying ‘boredom threshold’; Above this threshold, learning is rewarding – positively reinforcing exploitation. Below this threshold, the learning progress is too small to be rewarding, and they will return to random exploration. The threshold itself can be lowered through inhibition, allowing sustained attention despite smaller learning progress.
Here, we will first test this theory in a gaze-contingent eye-tracking task: 10-month-old infants will be introduced to two novel stimulus categories with 30 exemplars each (Fribbles, TarrLab). Two identical “houses” will be presented on a computer screen, and a new exemplar from either category will be revealed when the infant fixates on the corresponding house. This design will enable us to distinguish between exploration – switching from one category to the other – and exploitation – consecutively triggering exemplars from the same category. In follow-on studies we will test older children as well as adults, who will be able to trigger exemplar presentations via key presses. Across age groups, we will measure the number, speed, and sequence of trigger-events, as well as the switches between categories.
We hypothesize that if a category was triggered twice it is more likely to be triggered again; the first two triggers establish familiarity and allow for learning which will be rewarding, reinforcing further exploitation. While the length of ‘exploitation-runs’ may differ between participants (representing varying boredom thresholds), constant switching between categories is unlikely as it inhibits maximized learning.