Recording

Speaker

Paul Masset

Bio

Paul Masset is an Assistant Professor in the Department of Psychology at McGill University working at the intersection of neuroscience, AI and cognitive science. The focus of his research group is to understand how the structure of neural circuits endows the brain with efficient distributed computations underlying cognition and how we can leverage these principles to design more efficient learning algorithms. Prior to joining McGill, he was a Postdoctoral Fellow at Harvard University. He obtained his PhD at Cold Spring Harbor Laboratory, his Masters in Cognitive Science at the Γ‰cole des hautes Γ©tudes en sciences sociales (EHESS) and his M.Eng/B.A. in Information and Computer Engineering at the University of Cambridge.

Abstract

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning, a class of algorithms that has been successful at training artificial agents and at characterizing the firing of dopamine neurons in the midbrain In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations, and open new avenues for the design of more efficient reinforcement learning algorithms.