Task 001/002 - Neuro-inspired Algorithms and Theory

Event Date:	February 11, 2021
Time:	11:00 am (ET) / 8:00am (PT)
Priority:	No
School or Program:	Electrical and Computer Engineering
College Calendar:	Show

Tiago Marques, Massachusetts Institute of Technology
What does the primary visual cortex tell us about object recognition?

Abstract

Object recognition relies on the complex visual representations in cortical areas at the top of the primate ventral stream hierarchy. While these are thought to be derived from low-level stages of visual processing, this has not been shown, yet. Relating specific low-level properties to object recognition behavior is critical for modeling the primate ventral stream. In this talk, I describe the results of two projects exploring the contributions of primary visual cortex (V1) processing to object recognition using artificial neural networks (ANNs). In the first project, we evaluated hundreds of V1 models based on task-optimized ANNs in how their single neurons approximate those in the macaque V1. We found that, for some ANNs, single neurons in intermediate layers are indeed similar to their biological counterparts, and that the distributions of their response properties approximately match those in V1. Furthermore, we observed that ANNs with intermediate layers that better matched macaque V1 were more aligned with human behavior, suggesting that object recognition is in fact derived from low-level functional properties. Motivated by these results, we studied how the ability of a model to predict neuronal responses in V1 relates to its robustness to image perturbations. Despite their high performance in object recognition tasks, ANNs can be fooled by imperceptibly small, explicitly crafted perturbations, and solving this issue remains a major challenge in computer vision. We observed that ANNs that better predicted V1 neuronal activity were also more robust to adversarial attacks. Inspired by this observation, we developed VOneNets, a new class of hybrid ANN vision models. Each VOneNet contains a fixed weight neural network front-end that simulates primate V1, called the VOneBlock, followed by a neural network back-end adapted from current ANN vision models. After training, VOneNets retained high ImageNet performance, but were substantially more robust, outperforming the base ANNs and state-of-the-art methods on a conglomerate benchmark of perturbations. While current ANN architectures are arguably brain-inspired, these results demonstrate that more precisely mimicking just one stage of the primate visual system leads to new gains in ImageNet-level computer vision applications and can potentially improve current models of primate object recognition behavior.

Bio

How does hierarchical processing in neuronal networks in the brain give rise to sensory perception, and can we use this understanding to develop more human-like computer vision algorithms? Answering these questions has been the focus of Tiago Marques’ research during the past years. He first encountered the problem of visual perception during his PhD at the Champalimaud Research, where he studied visual cortical processing in the mouse. Under the supervision of Leopoldo Petreanu, Tiago developed a head-fixed motion discrimination task for mice and established a causal link between activity in the primary visual cortex (V1) and motion perception. Following that project, he studied the functional organization of cortical feedback and showed that feedback inputs in V1 relay contextual information to matching retinotopic regions in a highly organized manner. In 2019, Tiago joined the lab of Prof. James DiCarlo at MIT to continue his training, where he is currently a PhRMA Foundation Postdoc Fellow. His current research consists of using artificial neural networks (ANNs) to study primate object recognition behavior. He has continued to focus on early visual processing and implemented a set of novel benchmarks to evaluate how well different ANNs match primate V1 at the single neuron level. More recently, he started to develop new computer vision models constrained by neurobiological data that are more robust to image perturbations.