Links to external sources may no longer work as intended. The content may not represent the latest thinking in this area or the Society’s current position on the topic.
New approaches to 3D vision
Scientific discussion meeting organised by Professor Michael Morgan FRS, Dr Paul Linton, Professor Jenny Read, Dr Dhanraj Vishwanath, Professor Sarah Creem-Regehr and Professor Fulvio Domini.
Leading approaches to computer vision (SLAM: simultaneous localization and mapping), animal navigation (cognitive maps), and human vision (optimal cue integration), start from the assumption that the aim of 3D vision is to produce a metric reconstruction of the environment. Recent advances in machine learning, single-cell recording in animals, virtual reality, and visuomotor control, all challenge this assumption. The purpose of this meeting was to bring these different disciplines together to formulate an alternative approach to 3D vision.
The schedule of talks and speaker biographies and abstracts are available below. An accompanying journal issue has been published in Philosophical Transactions of the Royal Society B.
Attending the event
This meeting has taken place. Watch the recordings on our YouTube channel.
Enquiries: please contact the Scientific Programmes team
Organisers
Schedule
Chair
Dr Andrew Fitzgibbon FREng, Microsoft, UK
Dr Andrew Fitzgibbon FREng, Microsoft, UK
Dr Andrew Fitzgibbon has been closely involved in the delivery of three groundbreaking computer vision systems over two decades. In 2000, he was computer vision lead on the Emmy-award-winning 3D camera tracker "Boujou"; in 2009 he introduced large-scale synthetic training data to Kinect for Xbox 360, and in 2019 was science lead on the team that shipped fully articulated hand tracking on HoloLens 2. His passion is bringing the power of mathematics to the crucible of real-world engineering. He has numerous research awards, including twelve "best paper" or "test of time" prizes at leading conferences, and is a Fellow of the UK’s Royal Academy of Engineering.
15:00 - 15:05 | Introduction |
---|---|
15:05 - 15:30 |
Neural priors, neural encoders and neural renderers
Scene representation – the process of converting visual sensory data into useful descriptions – is a requirement for intelligent behaviour. Scene representation can be achieved with three components: a prior (which scenes are likely?), an encoder (which scenes correspond to this image?), and a renderer (which images correspond to this scene?). This talk will describe how neural priors, encoders and renderers can be trained without any human-provided labels, and show how this unlocks new capabilities eg in protein structure understanding. Dr SM Ali Eslami, DeepMind, UK
Dr SM Ali Eslami, DeepMind, UKSM Ali Eslami is a staff research scientist at DeepMind working on problems related to artificial intelligence. Prior to that, he was a post-doctoral researcher at Microsoft Research in Cambridge. He did his PhD in the School of Informatics at the University of Edinburgh, during which he was also a visiting researcher in the Visual Geometry Group at the University of Oxford. His research is focused on figuring out how we can get computers to learn with less human supervision. |
15:30 - 16:00 |
Multi-scale predictive representations and human-like RL
Many artificial agents pass benchmarks for solving specific tasks, but what would make their behaviour seem more human-like to humans? In previous research Dr Momennejad used reinforcement learning (RL) to study how humans learn multiscale predictive representations for memory, planning, and navigation. Based on this work, she will first present behavioural, fMRI, and electrophysiology evidence that hippocampal and prefrontal hierarchies learn multi-scale predictive representations, and update them via offline replay. She will then present recent work in which the group assesses the human-likeness of algorithms that all solve a navigation task. To this end, the group designed a Turing test to assess and compare human-likeness of agents navigating an XBOX game ("Bleeding Edge"). Together, representation and replay models enhance our understanding of how the brain's algorithms underlie behaviour in health and pathology. In turn, designing AI algorithms inspired by models that are neurally and behaviourally plausible can advance the state of the art of human-like RL. Dr Ida Momennejad, Microsoft Research NYC, USA
Dr Ida Momennejad, Microsoft Research NYC, USADr Ida Momennejad is a Principal Researcher in Reinforcement Learning at Microsoft Research NYC. She studies how we build models of the world and use them in memory, exploration and planning. To do this, she builds and tests neurally plausible algorithms for learning the structure of the environment. Her approach spans single and multi-agent settings, combining reinforcement learning, network science and machine learning with behavioural experiments, fMRI and electrophysiology. |
16:00 - 16:20 | Discussion |
Chair
Dr Andrew Fitzgibbon FREng, Microsoft, UK
Dr Andrew Fitzgibbon FREng, Microsoft, UK
Dr Andrew Fitzgibbon has been closely involved in the delivery of three groundbreaking computer vision systems over two decades. In 2000, he was computer vision lead on the Emmy-award-winning 3D camera tracker "Boujou"; in 2009 he introduced large-scale synthetic training data to Kinect for Xbox 360, and in 2019 was science lead on the team that shipped fully articulated hand tracking on HoloLens 2. His passion is bringing the power of mathematics to the crucible of real-world engineering. He has numerous research awards, including twelve "best paper" or "test of time" prizes at leading conferences, and is a Fellow of the UK’s Royal Academy of Engineering.
16:40 - 17:10 |
Generalization in data-driven control
Current machine learning methods are primarily deployed for tackling prediction problems, which are almost always cast as supervised learning tasks. Despite decades of advances in reinforcement learning and learning-based control, the applicability of these methods to domains that require open-world generalization – autonomous driving, robotics, aerospace, and other applications – remain challenging. Realistic environments require effective generalization, and effective generalization requires training on large and diverse datasets that are representative of the likely test-time scenarios. Dr Levine will discuss why this poses a particular challenge for learning-based control, and present some recent research directions that aim to address this challenge. He will discuss how offline reinforcement learning algorithms can make it possible for learning-based control systems to utilize large and diverse real-world datasets, how the use of diverse data can enable robotic systems to navigate real-world environments, and how multi-task and contextual policies can enable broad generalization to a range of user-specified goals. Dr Sergey Levine, UC Berkeley and Google, USA
Dr Sergey Levine, UC Berkeley and Google, USASergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a PhD in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as computer vision and graphics. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more. |
---|---|
17:10 - 17:40 |
Understanding 3D vision as a policy network
A 'policy network' is a term used in reinforcement learning to describe the set of actions that are generated in different states, where the ‘state’ reflects both the current sensory stimulus and goal. This is not the typical foundation for describing 3D vision, which in computer vision is based on reconstruction (eg Simultaneous Localisation And Mapping) while in neuroscience the predominant hypothesis has assumed that there are 3D transformations between retino-centric, ego-centric and world-based reference frames. Theoretical and experimental evidence in support of this neural hypothesis is lacking. Professor Glennerster will describe instead an approach that avoids 3D coordinate frames. A policy network for saccades (pure rotations of the camera/eye around the optic array) is a starting point for understanding (i) an ego-centric representation of visual direction, distance, slant and depth relief (what Marr hoped to achieve with his 2½-D sketch) and (ii) a hierarchical, compositional representation for navigation. We have known for a long time how the brain can implement a policy network so, if we could describe 3D vision in terms of a policy network (where the actions are either saccades or head translations), we would have moved closer to a neurally plausible model of 3D vision. Professor Andrew Glennerster, University of Reading, UK
Professor Andrew Glennerster, University of Reading, UKAndrew Glennerster trained as doctor before working for Michael Morgan as a research assistant. He did his DPhil at Oxford with Brian Rogers on human binocular stereopsis (1989–1993) then held a Medical Research Council Career Development Award (1994–1998) with Andrew Parker in Physiology at Oxford and Suzanne McKee at the Smith-Kettlewell Eye Research Institute in San Francisco. A Royal Society University Research Fellowship (1999–2007) enabled him, with Andrew Parker, to set up one of the earliest virtual reality labs to investigate 3D perception in moving observers. Glennerster is now a Professor of Visual Neuroscience at the University of Reading. Since 2015, Glennerster has been part of a MURI/EPSRC/Dstl consortium of machine learning experts and neuroscientists; the role of the neuroscientists is to point out how machine learning could be adapted to become ‘more human-like’. |
17:40 - 18:00 | Discussion |
Chair
Professor Matteo Carandini, University College London, UK
Professor Matteo Carandini, University College London, UK
Matteo Carandini is the GlaxoSmithKline / Fight for Sight Professor of Visual Neuroscience at University College London (UCL), where he co-directs a laboratory together with Kenneth Harris (www.ucl.ac.uk/cortexlab). Carandini holds a Laurea in Mathematics from the University of Rome (1990) and a PhD in Neural Science from New York University (1996). Before joining UCL in 2007, he ran laboratories in Zurich (1998–2002) and San Francisco (2002–2008). Carandini's research focuses on the computations performed by large populations of neurons in the mouse brain, the circuits that support these computations, and the role of these computations in guiding behaviour. He is a leader of the Neuropixels consortium, which develops next-generation probes to record from large populations of neurons (www.ucl.ac.uk/neuropixels). He is a founding member of the International Brain Laboratory, an open-science approach where 21 laboratories joined forces to understand the neural basis of decision-making in the whole mouse brain (www.internationalbrainlab.org).
15:00 - 15:30 |
Stupid stereoscopic algorithms that still work
Stereopsis has traditionally been considered a complex visual ability, restricted to large-brained animals. The discovery in the 1980s that insects, too, have stereopsis therefore challenged theories of stereopsis. How can such simple brains see in 3D? One answer is simply that insect stereopsis is much lower-resolution, and probably does not produce even a coarse depth map across the visual field. Rather, it may aim to produce simple behaviour, such as orienting towards the closer of two objects or triggering a strike when prey comes within range. Scientific thinking about stereopsis has been unduly anthropomorphic, for example assuming that stereopsis must require binocular fusion or a solution of the stereo correspondence problem. In fact, useful behaviour can be produced with very basic stereoscopic algorithms which make no attempt to achieve fusion or correspondence. This may explain why some aspects of insect stereopsis seem poorly designed from an engineering point of view: for example, paying no attention to whether interocular contrast or velocities match. Such 'stupid' algorithms demonstrably work well enough in practice for their species, and may prove useful in particular autonomous applications. Professor Jenny Read, Newcastle University, UK
Professor Jenny Read, Newcastle University, UKJenny Read is Professor of Vision Science at Newcastle University’s Biosciences Institute. She took an undergraduate degree in physics (1994), a doctorate in theoretical physics (1997) and a Masters in neuroscience (1999) at Oxford University, UK. From 1997–2001 she was a Wellcome Training Fellow in Mathematical Biology at Oxford University, then from 2001–2005 a postdoctoral fellow at the US National Eye Institute in Bethesda, Maryland. She returned to the UK in 2005 with a Royal Society University Research Fellowship. Her lab works on many aspects of visual perception, especially binocular and stereoscopic vision. Current projects include modelling how visual cortex encodes binocular information, developing a new stereo vision test for children, and uncovering how insects see in stereoscopic 3D. More information and all publications are available at http://www.jennyreadresearch.com. |
---|---|
15:30 - 16:00 |
Visual processing in the brain during navigation
Much of our everyday visual experience is based on our movements through the world, when we navigate between different places – from within a room, to between cities. Is visual function the same during navigation? Dr Saleem will be presenting work where using a virtual reality environment, and presenting identical visual stimuli in different locations, they asked if spatial position modulates activity in the visual system. Activity in the primary visual cortex (V1) was found to be strongly modulated by spatial position, and this modulation persisted across higher visual areas in the cortex. This modulation was not present in inputs to visual cortex from the lateral geniculate nucleus. Furthermore, the spatial modulation of visual responses was stronger when animals actively navigated, rather than passively view the environment. These results suggest that the spatial modulation of visual information arises in V1 with active navigation. The Saleem Lab has also been investigating feedback inputs to V1, and visual responses to optic flow stimuli. They have also developed an open-source software paradigm, BonVision, that can both present both 2D and 3D stimuli in a common framework, while maintaining the precision and replicability of standardised visual experiments. Dr Aman Saleem, University College London, UK
Dr Aman Saleem, University College London, UKDr Aman Saleem is a Sir Henry Dale Fellow at the UCL Department of Experimental Psychology, specialising in systems neuroscience. Aman started his training with an undergraduate degree in Engineering from the Indian Institute of Technology, Bombay. He then transitioned into neuroscience with a PhD in computational neuroscience with Dr Simon Schultz at Imperial College London, studying information processing in visual systems of flies and rodents. As a postdoc with Professor Matteo Carandini and Professor Kenneth Harris, where he studied how the visual system functions during locomotion and navigation: discovering how non-visual information is encoded by visual areas of the brain. He started his own lab as a Sir Henry Dale fellow at the UCL Department of Experimental Psychology in 2017. The lab's main focus is to under how the brain uses visual information to perceive one's location in the world and navigate. |
16:00 - 16:20 | Discussion |
Chair
Professor Matteo Carandini, University College London, UK
Professor Matteo Carandini, University College London, UK
Matteo Carandini is the GlaxoSmithKline / Fight for Sight Professor of Visual Neuroscience at University College London (UCL), where he co-directs a laboratory together with Kenneth Harris (www.ucl.ac.uk/cortexlab). Carandini holds a Laurea in Mathematics from the University of Rome (1990) and a PhD in Neural Science from New York University (1996). Before joining UCL in 2007, he ran laboratories in Zurich (1998–2002) and San Francisco (2002–2008). Carandini's research focuses on the computations performed by large populations of neurons in the mouse brain, the circuits that support these computations, and the role of these computations in guiding behaviour. He is a leader of the Neuropixels consortium, which develops next-generation probes to record from large populations of neurons (www.ucl.ac.uk/neuropixels). He is a founding member of the International Brain Laboratory, an open-science approach where 21 laboratories joined forces to understand the neural basis of decision-making in the whole mouse brain (www.internationalbrainlab.org).
16:40 - 17:10 |
The cognitive map of 3D space: not as metric as we thought?
The mammalian representation of navigable space (space that an animal moves itself through) is supported by a network of brain regions, centred on the hippocampus, that transform raw sensory signals into an internal map-like representation that can be used in navigation. It has long been thought that this map is metric, because its central units, the place cells, respond parametrically to metric changes in the environment such as stretching. This view was consolidated by the discovery of grid cells, which have evenly spaced firing fields that reveal metric computations such as speed, direction and distance. However, how these neurons behave in complex 3D space suggests that the map is not absolutely metric but is rather only loosely so, being tailored to the environment structure and/or shaped by its movement affordances. This accords with studies showing that humans seem to use a less metric and more topological internal map when performing spatial judgements. The emerging picture is one of a hierarchical processing system with highly metric processing of near space but progressively more topological maps at larger scales. This may be a way of saving processing resources, and could reflect a more general organisational principle of complex cognition. Professor Kate Jeffery, University College London, UK
Professor Kate Jeffery, University College London, UKKate Jeffery is a neuroscientist based at University College London (UCL). Her research focuses on how the brain represents complex navigable space (space that can be moved through), and she does this by recording single neurons in the brains of rodents as they explore structured spaces of various types. She is particularly interested in how 3D space is mapped, and also in the sense of direction and the environmental factors that support or confuse it. At UCL she heads the Institute of Behavioural Neuroscience in the Division of Psychology and Language Sciences, and is Vice Dean (Research) for the Faculty of Brain Sciences. She is also co-director of the electrophysiology company Axona Ltd, which makes high-density recording systems for behavioural neuroscientists, and is a Fellow of the Royal Society of Biology and Fellow of the Royal Institute of Navigation, where she chairs the Cognitive Navigation Special Interest group. |
---|---|
17:10 - 17:40 |
Locally ordered representation of 3D space in the entorhinal cortex
As animals navigate on a two-dimensional surface, neurons in the medial entorhinal cortex (MEC) known as grid cells are activated when the animal passes through multiple locations (firing fields) arranged in a hexagonal lattice tiling the locomotion surface. However, although our world is three-dimensional (3D), it is unclear how the MEC represents 3D space. The group recorded from MEC cells in freely flying bats and identified several classes of spatial neurons, including 3D border cells, 3D head-direction cells, and neurons with multiple 3D firing fields. Many of these multifield neurons were 3D grid cells, whose neighbouring fields were separated by a characteristic distance – forming a local order – but lacked global lattice arrangement of the fields. Thus, whereas 2D grid cells form a global lattice – characterized by both local and global order – 3D grid cells exhibited only local order, creating a locally ordered metric for space. The group modelled grid cells as emerging from pairwise interactions between fields, which yielded a hexagonal lattice in 2D and local order in 3D, describing both 2D and 3D grid cells using one unifying model. Together, these data and model illuminate fundamental differences and similarities between neural codes for 3D and 2D space in the mammalian brain. Gily Ginosar, Weizmann Institute of Science, Israel
Gily Ginosar, Weizmann Institute of Science, IsraelGily pursues her PhD at the Weizmann Institute of Science in Israel, advised by Professor Nachum Ulanovsky. She is interested in mammalian navigation, and her work revolves around how the brain represents and perceives 3D space in both bats and humans. Her fascination with 3D navigation began during her Air Force service, where she trained aircrew to navigate during flight. She received her BSc in Physics and Cognitive Sciences from the Hebrew University in Jerusalem and her MSc in Brain Sciences from the Weizmann Institute. |
17:40 - 18:00 | Discussion |
Chair
Dr Mar Gonzalez-Franco, Microsoft Research, USA
Dr Mar Gonzalez-Franco, Microsoft Research, USA
Mar is a Principal Researcher in the Extended Perception, Interaction and Cognition – or EPIC – group at Microsoft Research, where she explores human behaviour and perception to build better technologies in the wild. With a particular focus on spatial computing, avatars and Haptics. In addition to her prolific scientific output, her work has also transferred to products used daily around the world, like Hololens, Microsoft Soundscape and Together mode in Microsoft Teams. Mar holds a PhD in Immersive Virtual Reality and Clinical Psychology, was a visiting researcher at the MIT, and did her postdoc studies at University College in London.
15:00 - 15:30 |
Tripartite encoding of visual 3D space
A major challenge for prevailing models of human 3D vision is their inability to provide a satisfactory account of important aspects of our subjective awareness of 3D visual space. Reviewing phenomenological observations, empirical data, evolutionary logic and neurophysiological evidence, this presentation argues that human conscious awareness of visual space is underwritten by three separate spatial encodings that are optimized for specific regions of visual space (1) encoding of unscaled 3D object shape and layout (relative depth); (2) encoding of scaled intra- and inter-object distances (scaled depth) for near space (3) egocentric encoding of distances for ambulatory space. This account of separate and neurophysiological distinct spatial encodings can account for a number of important observations in the subjective awareness of 3D space, such as the paradoxical human capacity to perceive 3-dimensionality in 2-dimensional pictorial images, the unique subjective impression of object tangibility, negative space and object realness associated with binocular stereopsis and the capacity to be subjectively aware of distances beyond the peri-personal space even in the absence of binocular vision. This account provides a basis to better understand the conditions that underlie the subjective feeling of visual spatial immersion and presence. Dr Dhanraj Vishwanath, University of St Andrews, UK
Dr Dhanraj Vishwanath, University of St Andrews, UKDr Dhanraj Vishwanath is Senior Lecturer in Perception at the School of Psychology and Neuroscience at the University of St Andrews. His research interests are in 3D vision, visual aesthetics, eye movements and attention, with a special focus on phenomenological and philosophical issues. With his collaborators, he has made empirical and theoretical contributions in pictorial space perception, the role of blur in depth perception, the phenomenology of stereopsis, as well as spatial localization in eye movements and attention. In addition to his current work on 3D perception, he is working on a theoretical account of the psychology of visual art and aesthetics. He received his PhD from Rutgers University, New Brunswick, followed by postdoctoral work at UC Berkeley. |
---|---|
15:30 - 16:00 |
New approaches to visual scale and visual shape
Human 3D vision is thought to triangulate the size, distance, direction, and 3D shape of objects using vision from the two eyes. But all four of these capacities rely on the visual system knowing where the eyes are pointing. Dr Linton's experimental work on size and distance challenge this account, suggesting a purely retinal account of visual size and distance, and likely direction and 3D shape. This requires new accounts of visual scale and visual shape. For visual scale, he argues that observers rely on natural scene statistics to associate accentuated stereo depth (largely from horizontal disparities) with closer distances. This implies that depth / shape is resolved before size and distance. For visual shape, he argues that depth / shape from the two eyes is a solution to a different problem (rivalry eradication between two retinal images treated as if they are from the same viewpoint), rather than the visual system attempting to infer scene geometry (by treating the two retinal images as two different views of the same scene from different viewpoints). Dr Linton also draws upon his book, which questions whether other depth cues (perspective, shading, motion) really have any influence on this process. Dr Paul Linton, City, University of London, UK
Dr Paul Linton, City, University of London, UKPaul Linton is a Research Fellow in 3D Vision at the Centre for Applied Vision Research, City, University of London. He was previously a Stipendiary Lecturer at the University of Oxford, and a Teaching Fellow at University College London. He was also a member of the DeepFocus team at Facebook Reality Labs, where he used vision science to inform the development of virtual and augmented reality technology. He is the author of the book The Perception and Cognition of Visual Space (Palgrave, 2017), which challenges contemporary accounts of depth cue integration. His experimental research shows that humans are unable to triangulate the size and distance of objects, with implications for visual scale, binocular disparity processing, multisensory integration, and object interaction. His recent theoretical work considers the extent of cognitive processing in V1. For further details, please visit: https://linton.vision. |
16:00 - 16:20 | Discussion |
Chair
Dr Mar Gonzalez-Franco, Microsoft Research, USA
Dr Mar Gonzalez-Franco, Microsoft Research, USA
Mar is a Principal Researcher in the Extended Perception, Interaction and Cognition – or EPIC – group at Microsoft Research, where she explores human behaviour and perception to build better technologies in the wild. With a particular focus on spatial computing, avatars and Haptics. In addition to her prolific scientific output, her work has also transferred to products used daily around the world, like Hololens, Microsoft Soundscape and Together mode in Microsoft Teams. Mar holds a PhD in Immersive Virtual Reality and Clinical Psychology, was a visiting researcher at the MIT, and did her postdoc studies at University College in London.
16:40 - 17:10 |
Perception and action in Virtual and Augmented Reality
Virtual and Augmented Reality (VR and AR) methods provide both opportunities and challenges for research and applications involving space perception. The opportunities result from the ability to immerse a user in a realistic environment in which they can interact, while at the same time having the ability to control and manipulate environmental and body-based cues in ways that are difficult or impossible to do in the real world. The challenge comes from the notion that virtual environments will be most useful if they achieve high perceptual fidelity – that observers will perceive and act in the mediated environment as they would in the real world. A pervasive finding across early research on space perception in virtual environments is that absolute distance is underestimated as compared to the real world. Using the challenge of underestimation of scale as a starting point, this talk presents new measures (perceived affordances) and methods of feedback (body-based cues), as well as advances in technologies (mixed reality) and cues (shadows), that contribute to a broader understanding of perceptual fidelity across the continuum of mediated environments. Professor Sarah Creem-Regehr, University of Utah, USA
Professor Sarah Creem-Regehr, University of Utah, USASarah Creem-Regehr is a Professor in the Psychology Department at the University of Utah. She also holds faculty appointments in the School of Computing and the Neuroscience program at the University of Utah. She received her PhD in Psychology from the University of Virginia. Her research examines how humans perceive, learn, and navigate spaces in natural, virtual, and visually impoverished environments. Her research takes an interdisciplinary approach, combining the study of space perception and spatial cognition with applications in visualization and virtual environments. She co-authored the book Visual Perception from a Computer Graphics Perspective and was previously Associate Editor for Psychonomic Bulletin & Review and Journal of Experimental Psychology: Human Perception and Performance. She is currently Associate Editor for Quarterly Journal of Experimental Psychology and she will become Editor-in-Chief of Cognitive Research: Principles and Implications in January 2022. |
---|---|
17:10 - 17:40 |
Engineering challenges for realistic displays
How can a display appear indistinguishable from reality? Dr Lanman describes how to pass this 'visual Turing test' using AR/VR headsets, emphasizing the joint design of optics, display components, rendering algorithms, and sensing elements. Specifically, this presentation will focus on the engineering challenges for advancing along four axes: resolution, accommodation, distortion correction, and dynamic range. Dr Douglas Lanman, Facebook Reality Labs, USA
Dr Douglas Lanman, Facebook Reality Labs, USADouglas Lanman is the Director of Display Systems Research at Facebook Reality Labs, where he leads investigations into advanced display and imaging technologies for augmented and virtual reality. His prior research has focused on head-mounted displays, glasses-free 3D displays, light-field cameras, and active illumination for 3D reconstruction and interaction. He received a BS in Applied Physics with honors from Caltech in 2002, and his MS and PhD in Electrical Engineering from Brown University in 2006 and 2010, respectively. He was a Senior Research Scientist at NVIDIA Research from 2012 to 2014, a Postdoctoral Associate at the MIT Media Lab from 2010 to 2012, and an Assistant Research Staff Member at MIT Lincoln Laboratory from 2002 to 2005. |
17:40 - 18:00 | Discussion |
Chair
Professor Jody Culham, Western University, Canada
Professor Jody Culham, Western University, Canada
Jody Culham is a Professor of Psychology and member of the Brain and Mind Institute and Neuroscience Program at Western University (also known as The University of Western Ontario) in London, Canada. Her research uses behavioural approaches and functional neuroimaging to investigate how vision is used to perceive the world and to guide actions such as grasping and reaching. An emerging theme of her lab is Immersive Neuroscience, which examines how new technologies such as virtual reality and optical neuroimaging can be used to study behaviour and brain function in natural contexts and compelling simulations.
15:00 - 15:30 |
A novel non-probabilistic model of 3D cue integration explains both perception and action
It will be argued that perceptual and action tasks that require the encoding of 3D information are both based on the same set of computations. These are described by a computational theory of 3D cue integration, which constitutes a novel theoretical framework to study 3D vision in humans. The proposed computational theory differs from the current mainstream approaches to the problem in two fundamental ways. First, it assumes that 3D mechanisms are deterministic processes that map a given visual stimulus to a unique 3D representation. In contrast, the currently held view of perception as Bayesian inference postulates a probabilistic nature of 3D representation. Second, the proposed theory posits that 3D processing is heuristic, finding correct solutions to the problem only in ideal viewing conditions and not as a general goal of visual computations. The deterministic and heuristic nature of these computations is therefore inconsistent with Bayesian approaches that model brain mechanisms as processes that derive the most accurate and precise representation of 3D structures. Instead, this theory predicts systematic biases in depth estimates that identically affect perceptual judgements and goal-directed actions. Professor Fulvio Domini, Brown University, USA
Professor Fulvio Domini, Brown University, USAFulvio Domini completed his Masters in Electrical Engineering and PhD in Experimental Psychology at the University of Trieste, Italy. He joined that faculty at Brown University, USA in 1999 where is currently Professor at the Department of Cognitive, Linguistic and Psychological Sciences. His research team investigates how the human visual system processes 3D visual information to allow successful interactions with the environment. His approach is to combine computational methods and behavioral studies to understand what are the visual features that establish the mapping between vision and action. Contrary to the commonly held assumption that perception and action stem from separate visual mechanisms, he takes a fundamentally different view, proposing that perception and action form a coordinated system: perception informs action about the state of the world and, in turn, action shapes perception by signalling when it is faulty. |
---|---|
15:30 - 16:00 |
Dissociations between perception and action in size-distance scaling
One of the most puzzling abilities of the human brain is size constancy: an object is perceived as having the same size even though its image on the retina varies continuously with viewing distance. An accurate representation of size is critical not only for perceptual recognition, but also for goal-directed actions, such as grasping. In fact, to successfully grasp an object, our grip aperture needs to be scaled to the true size of the object irrespective of viewing distance, a scaling operation that can be referred to as grip constancy. In this talk, Dr Sperandio will present findings from studies on both healthy volunteers and a neurological patient with large bilateral lesions that include V1 and most of the occipital cortex. By measuring perceptual judgments and grasp kinematics in response to conditions in which the image on the retina was either different (for example, by placing an object of a given physical size near and far from the observers) or constant (for example, by placing a small object near and a big object far) in size, she will provide evidence that the neural mechanisms underlying size constancy for perception and action are dissociable and rely upon distinct representations of size. Dr Irene Sperandio, University of Trento, Italy
Dr Irene Sperandio, University of Trento, ItalyIrene Sperandio graduated magna cum laude from the University of Padova and received her doctoral degree in Psychology from the University of Verona (Department of Neurological and Vision Sciences), examining size constancy and visual illusions. After her PhD, she spent three years working as a post-doctoral fellow under the supervision of Professor Melvyn Goodale at the Brain and Mind Institute at the University of Western Ontario, Canada, where she combined psychophysics and fMRI to investigate the neural correlates of visual perception and expanded her research to sensory-motor control using visual psychophysics, eye movement and kinematic recordings. In December 2012, she was appointed to her first faculty position as a lecturer in the School of Psychology at the University of East Anglia, UK, where she stayed for seven years. In April 2020, she joined the Department of Psychology and Cognitive Science at the University of Trento, Italy as an Assistant Professor. |
16:00 - 16:10 | Discussion |
Chair
Professor Jody Culham, Western University, Canada
Professor Jody Culham, Western University, Canada
Jody Culham is a Professor of Psychology and member of the Brain and Mind Institute and Neuroscience Program at Western University (also known as The University of Western Ontario) in London, Canada. Her research uses behavioural approaches and functional neuroimaging to investigate how vision is used to perceive the world and to guide actions such as grasping and reaching. An emerging theme of her lab is Immersive Neuroscience, which examines how new technologies such as virtual reality and optical neuroimaging can be used to study behaviour and brain function in natural contexts and compelling simulations.
16:20 - 16:50 |
Do you hear what I see? How do early blind individuals experience object motion?
Perceiving object motion is fundamentally multisensory, yet little is known about similarities and differences in motion computations across different senses. Insight can be provided by examining auditory motion processing in early blind individuals. Early blindness leads to ‘recruitment’ of the ‘visual’ motion area hMT+ for auditory motion processing. Meanwhile, the planum temporale, associated with auditory motion in sighted individuals, shows reduced selectivity for auditory motion, suggesting competition between cortical areas for functional role. According to the metamodal hypothesis of cross-modal plasticity developed by Pascual-Leone, the recruitment of hMT+ is driven by it being a metamodal structure containing “operators that execute a given function or computation regardless of sensory input modality”. According to the metamodal hypothesis, the computations underlying auditory motion processing in early blind individuals should be analogous to visual motion processing in sighted individuals – relying on non-separable spatiotemporal filters. Inconsistent with the metamodal hypothesis, auditory motion filters, in both blind and sighted subjects, are separable in space and time. The computations underlying auditory motion processing in early blind individuals are not qualitatively altered; instead, the recruitment of hMT+ to extract motion information from auditory input includes significant modification of its normal computational operations. Professor Ione Fine, University of Washington, USA
Professor Ione Fine, University of Washington, USADr Fine received her undergraduate degree for Oxford University, her PhD from the University of Rochester and completed carried out postdoctoral research at the University of California, San Diego. After a brief period in the Ophthalmology Department at the University of Southern California and working for Second Sight Medical Products, she moved to the University of Washington, Seattle, where she is now a Full Professor and co-Director of the UW Center of Human Neuroscience. Her research examines perceptual learning and plasticity with an emphasis on the effects of visual deprivation. Current work includes taking a computational approach towards understanding visual performance in patients who with sight recovery technologies, and examining the cortical and behavioral effects of long-term visual deprivation. She is a Fellow of the Optical Society of America. |
---|---|
16:50 - 17:20 |
The role of binocular vision in the development of visuomotor control and performance of fine motor skills
The ability to perform accurate, precise and temporally coordinated goal-directed actions is fundamentally important to activities of daily life, as well as skilled occupational and recreational performance. Vision provides a key sensory input for the normal development of visuomotor skills. Normal visual development is disrupted by amblyopia, a neurodevelopmental disorder characterized by impaired visual acuity in one eye and reduced binocularity, which affects 2–4% of children and adults. This presentation will discuss a growing body of research which demonstrates that binocular vision provides an important input for optimal development of the visuomotor system, specifically visually guided upper limb movements such as reaching and grasping. Research shows that decorrelated binocular experience is associated with both deficits and compensatory adaptations in visuomotor control. Parallel studies with typically developing children and visually normal adults provide converging evidence supporting the contribution of stereopsis to the control of grasping. Overall, this research advances our understanding about the role of binocular vision in the development and performance of visuomotor skills, which is the first step towards developing assessment tools and targeted rehabilitations for children with neurodevelopment disorders at risk of poor visuomotor outcomes. Dr Ewa Niechwiej-Szwedo, University of Waterloo, Canada
Dr Ewa Niechwiej-Szwedo, University of Waterloo, CanadaDr Ewa Niechwiej-Szwedo is an Associate Professor at the Department of Kinesiology and Health Sciences at the University of Waterloo, Canada. Her research is focused on two areas: 1) investigating the neuroplastic adaptation of oculomotor and upper limb movement control in individuals with abnormal binocular vision due amblyopia and strabismus; 2) mapping out the typical and atypical maturation trajectory of visuomotor skills in children. This research provides novel insights about the capacity of the sensorimotor system to adapt when normal visual experience is disrupted during childhood. The ultimate goal of this research program is to inform the development of assessment tools and targeted rehabilitation regimens for children with amblyopia or strabismus who might be at risk of poor visuomotor outcomes. |
17:20 - 17:30 | Discussion |
Chair
Professor Michael Morgan FRS, City, University of London, UK
Professor Michael Morgan FRS, City, University of London, UK
Professor Michael Morgan is an Experimental Psychologist whose main interest is in Visual Perception. He graduated in Natural Sciences from the University of Cambridge in 1964 and has held teaching and research positions in the Universities of Cambridge, Durham, UCL, Edinburgh (Darwin Professorial Fellow) and most recently, City, University of London. His main publications have been in the areas of Spatial Vision, Motion Perception and Eye Movements. His contributions to 3D vision include investigations of the role of interocular spatiotemporal phase differences. He is the author of two books on Vision: Molyneux’s Question (1977) and The Space Between Our Ears (2003).
17:40 - 18:30 |
Panel discussion
The Chairs (Dr Andrew Fitzgibbon, Professor Matteo Carandini, Dr Mar Gonzalez-Franco, and Professor Jody Culham) discuss future directions for 3D vision in an interactive question and answer session with the audience. Dr Andrew Fitzgibbon FREng, Microsoft, UK
Dr Andrew Fitzgibbon FREng, Microsoft, UKDr Andrew Fitzgibbon has been closely involved in the delivery of three groundbreaking computer vision systems over two decades. In 2000, he was computer vision lead on the Emmy-award-winning 3D camera tracker "Boujou"; in 2009 he introduced large-scale synthetic training data to Kinect for Xbox 360, and in 2019 was science lead on the team that shipped fully articulated hand tracking on HoloLens 2. His passion is bringing the power of mathematics to the crucible of real-world engineering. He has numerous research awards, including twelve "best paper" or "test of time" prizes at leading conferences, and is a Fellow of the UK’s Royal Academy of Engineering. Professor Matteo Carandini, University College London, UK
Professor Matteo Carandini, University College London, UKMatteo Carandini is the GlaxoSmithKline / Fight for Sight Professor of Visual Neuroscience at University College London (UCL), where he co-directs a laboratory together with Kenneth Harris (www.ucl.ac.uk/cortexlab). Carandini holds a Laurea in Mathematics from the University of Rome (1990) and a PhD in Neural Science from New York University (1996). Before joining UCL in 2007, he ran laboratories in Zurich (1998–2002) and San Francisco (2002–2008). Carandini's research focuses on the computations performed by large populations of neurons in the mouse brain, the circuits that support these computations, and the role of these computations in guiding behaviour. He is a leader of the Neuropixels consortium, which develops next-generation probes to record from large populations of neurons (www.ucl.ac.uk/neuropixels). He is a founding member of the International Brain Laboratory, an open-science approach where 21 laboratories joined forces to understand the neural basis of decision-making in the whole mouse brain (www.internationalbrainlab.org). Dr Mar Gonzalez-Franco, Microsoft Research, USA
Dr Mar Gonzalez-Franco, Microsoft Research, USAMar is a Principal Researcher in the Extended Perception, Interaction and Cognition – or EPIC – group at Microsoft Research, where she explores human behaviour and perception to build better technologies in the wild. With a particular focus on spatial computing, avatars and Haptics. In addition to her prolific scientific output, her work has also transferred to products used daily around the world, like Hololens, Microsoft Soundscape and Together mode in Microsoft Teams. Mar holds a PhD in Immersive Virtual Reality and Clinical Psychology, was a visiting researcher at the MIT, and did her postdoc studies at University College in London. Professor Jody Culham, Western University, Canada
Professor Jody Culham, Western University, CanadaJody Culham is a Professor of Psychology and member of the Brain and Mind Institute and Neuroscience Program at Western University (also known as The University of Western Ontario) in London, Canada. Her research uses behavioural approaches and functional neuroimaging to investigate how vision is used to perceive the world and to guide actions such as grasping and reaching. An emerging theme of her lab is Immersive Neuroscience, which examines how new technologies such as virtual reality and optical neuroimaging can be used to study behaviour and brain function in natural contexts and compelling simulations. |
---|