‘Until it bores me’: Learning Progress Maximization as the Reward Mechanism to solve the Exploration-Exploitation Dilemma in Infants

Psychology

Electronic data

Altmann, Bazhydai, Westermann (2021) DevMoCon Poster
Final published version, 1.86 MB, image/png

Research output: Contribution to conference - Without ISBN/ISSN › Poster › peer-review

Published

Publication date	24/06/2021
<mark>Original language</mark>	English
Event	Development in Motion Conference 2021: Presented by the Marie Curie MOTION network - Online Duration: 22/06/2021 → 24/06/2021 https://www.devmocon2021.com/

Conference

Conference	Development in Motion Conference 2021
Abbreviated title	DevMoCon
Period	22/06/21 → 24/06/21
Internet address	https://www.devmocon2021.com/

Abstract

Infants explore the world to learn about it based on their intrinsically motivated curiosity. However, the mechanisms underlying such exploratory behavior are largely unknown. We propose a new theory in which active learners explore randomly until encountering a familiar entity (e.g. a second stimulus from a previously encountered category) because here, learning is suddenly maximized. Such a category will then be exploited as long as the learning progress is above an individually varying ‘boredom threshold’; Above this threshold, learning is rewarding – positively reinforcing exploitation. Below this threshold, the learning progress is too small to be rewarding, and they will return to random exploration. The threshold itself can be lowered through inhibition, allowing sustained attention despite smaller learning progress.
Here, we will first test this theory in a gaze-contingent eye-tracking task: 10-month-old infants will be introduced to two novel stimulus categories with 30 exemplars each (Fribbles, TarrLab). Two identical “houses” will be presented on a computer screen, and a new exemplar from either category will be revealed when the infant fixates on the corresponding house. This design will enable us to distinguish between exploration – switching from one category to the other – and exploitation – consecutively triggering exemplars from the same category. In follow-on studies we will test older children as well as adults, who will be able to trigger exemplar presentations via key presses. Across age groups, we will measure the number, speed, and sequence of trigger-events, as well as the switches between categories.
We hypothesize that if a category was triggered twice it is more likely to be triggered again; the first two triggers establish familiarity and allow for learning which will be rewarding, reinforcing further exploitation. While the length of ‘exploitation-runs’ may differ between participants (representing varying boredom thresholds), constant switching between categories is unlikely as it inhibits maximized learning.

Research

Electronic data