This page is archived

Links to external sources may no longer work as intended. The content may not represent the latest thinking in this area or the Society’s current position on the topic.

Understanding images in biological and computer vision

19 - 20 February 2018 09:00 - 17:00

Scientific discussion meeting organised by Dr Andrew Schofield, Professor Aleš Leonardis, Professor Marina Bloj, Professor Iain D Gilchrist and Dr Nicola Bellotto.

Vision appears easy for biological systems but replicating such performance in artificial systems is challenging. Nonetheless we are now seeing artificial vision deployed in robots, cars, mobile and wearable technologies. Such systems need to interpret the world and act upon it much as humans do. This multi-disciplinary meeting discussed recent advances at the junction of biological and computer vision.

Attending the event

This meeting has taken place.

Recorded audio of the presentations are now available on this page. Meeting papers will be published in a future issue of Interface Focus.

Enquires: Contact the Scientific Programmes team.

Schedule

Chair

Professor Iain D Gilchrist, University of Bristol, UK

09:45 - 10:00 Welcome by the Royal Society and Dr Andrew Schofield
10:00 - 10:30 Insect vision for robot navigation

Many insects have excellent navigational skills, covering distances, conditions and terrains that are still a challenge for robotics. The primary sense they use is vision, both to obtain self-motion information for path integration, and to establish visual memories of their surroundings to guide homing and route following. Insect vision is relatively low resolution, but exploits a combination of sensory tuning and behavioural strategies to solve complex problems. For example, by filtering for ultraviolet light in an omnidirectional view, segmentation of the shape of the horizon between sky and ground becomes both simple and highly consistent. These shapes can then be used for recognition of location, even under different weather conditions or variations in pitch and tilt. Insect brains are also relatively small and low powered, yet able to produce efficient and effective solutions to complex problems. Through computational modelling, it is becoming possible to link the insights from field experiments to neural data, and thus test hypotheses regarding these brain processing algorithms. For example, we have shown the circuitry of the insect mushroom body neuropil has been shown to be sufficient to support memory of hundreds of images, allowing rapid assessment of familiarity which can be used to guide the animal along previously experienced routes.

Professor Barbara Webb, University of Edinburgh, UK

10:30 - 11:00 Vision for action selection

In the production of visually guided, goal-directed behaviour two types of action may be distinguished. Sampling actions involve shifting gaze in order to acquire information from the visual environment. Manipulative actions involve using effectors in order to alter the state of the environment. Both types of actions are coordinated in a cycle in which noisy information from the environment is used to infer the state of that environment. Given this estimated state, an appropriate course of action has to be selected. Decision theoretic models have been used to account for the selection between possible competing actions within each class. A common feature of these models is that noisy sensory evidence in favour of different action choices is accumulated over time until a decision threshold is reached. The utility of these models will be evaluated for both types of actions. Specifically, the following open questions will be addressed. Is saccade target selection controlled by a "race to threshold" between competing motor programmes in the presence of a foveal load? To what extent is the temporal trigger signal controlled by the foveal processing demands and the selection of the next target? Do the dynamics of real world manipulative actions (reaching and grasping) reflect the underlying decision process? What is the linking function between the temporally evolving decision variable and the different components of manipulative actions?

Dr Casimir Ludwig, University of Bristol, UK

11:00 - 11:15 Discussion
11:15 - 11:45 Coffee
11:45 - 12:15 Vision in the context of natural behaviour

Investigation of vision in the context of ongoing behaviour has contributed a number of insights by highlighting the importance of behavioural goals, and focusing attention on how vision and action play out in time. In this context, humans make continuous sequences of sensory-motor decisions to satisfy current goals, and the role of vision is to provide the relevant information for making good decisions in order to achieve those goals. Professor Hayhoe will review the factors that control gaze in natural behaviour, including evidence for the role of the task, which defines the immediate goals, the rewards and costs associated with those goals, uncertainty about the state of the world, and prior knowledge. Visual computations are often highly task-specific, and evaluation of task relevant state is a central factor necessary for optimal action choices. This governs a very large proportion of gaze changes, which reveal the information sampling strategies of the human visual system. When reliable information is present in memory, the need for sensory updates is reduced, and humans can rely instead on memory estimates, depending on their precision, and combine sensory and memory data according to Bayesian principles. It is suggested that visual memory representations are critically important not only for choosing, initiating and guiding actions, but also for predicting their consequences, and separating the visual effects of self-generated movements from external changes.

Professor Mary Hayhoe, University of Texas, USA

12:15 - 12:45 Visual perception in legged and dynamic robots

Robotics captures the attention like few other fields of research, however to move beyond controlled laboratory settings dynamic robots need flexible, redundant and trustworthy sensing which pairs with their control systems. In this talk Dr Fallon will discuss the state of research in perception for dynamic robots: specifically robots which move quickly and rely on visual understanding to make their way in the world namely walking and flying robots. Dr Fallon will outline the development of state estimation, mapping and navigation algorithms for humanoid and quadruped robots and describe how they have been demonstrated in real fielded systems. The talk will also overview current limitations, both computational and sensory, and describe some prototype sensing systems which are biologically inspired. The first topic will overview the adaption of accurate registration methods to the Boston Dynamics Atlas and NASA Valkyrie robots where explored the challenge of localization over long baselines and with low sensory overlap. Secondly Dr Fallon will explore how these methods can be fuse with Vision for a dynamic quadruped trotting and crawling in challenging lighting conditions. Lastly, he will present ongoing research in probabilistically fusing proprioceptive state estimation with dense visual mapping to allow a humanoid robot to build a rich dense map while overcoming dynamics, moving objects and challenging lighting conditions.

Dr Maurice Fallon, University of Oxford, UK

12:45 - 13:00 Discussion
13:00 - 14:00 Lunch

Chair

Professor Marina Bloj, University of Bradford, UK

14:00 - 14:30 Understanding interactions between object colour, form, and light in human vision

Without light, there would be no colour. But, in human vision, colour is not simply determined by light. Rather, colour is the result of complex interactions between light, surfaces, eyes and brains. Embedded in these are the neural mechanisms of colour constancy, which stabilise object colours under changes in the illumination spectrum. Yet, colour constancy is not perfect, and predicting the colour appearance of a particular object for a particular individual, when viewed under a particular illumination, is not simple. Yellow bananas may remain ever yellow even under fluorescent lamps, but blue dresses may turn white under ambiguous lights. Various factors affect colour appearance: the shape of the object, whether it is 2D or 3D, of a recognisable form or not; its other surface properties, whether it is glossy or matte, textured or uniform; and individual variations in visual processing. The shape of the illumination spectrum also affects colour appearance: metamerism in contemporary lighting provides a new challenge to colour constancy. How to measure colour – a subjective experience – most reliably is another challenge for vision scientists. This talk will describe psychophysical experiments and theory addressing these considerations in our understanding of object colour perception by humans.

Professor Anya Hurlbert, Newcastle University, UK

14:30 - 15:00 Critical contours link generic image flows to salient surface organisation

Shape inferences from images, or line drawings, are classical ill-posed inverse problems. Computational researchers mainly seek 'priors' for regularisation, e.g. regarding the light source, or scene restrictions for training neural networks, such as indoor rooms. While of technical interest, such solutions differ in two fundamental ways from human perception: (i) our inferences are largely robust across lighting and scene variations; and (ii) individuals only perceive qualitatively, not quantitatively, the same shape from a given image. Importantly, we know from psychophysics that similarities across individuals concentrate near certain configurations, such as ridges and boundaries, and it is these configurations that are often represented in line drawings. Professor Zucker will introduce a method for inferring qualitative 3D shape from shading that is consistent with these observations. For a given shape, certain shading patches become equivalent to “line drawings” in a well-defined shading-to-contour limit. Under this limit, and invariantly, the contours partition the surface into meaningful parts using the Morse-Smale complex. Critical contours are the (perceptually) stable parts of this complex and are invariant over a wide class of rendering models. The result provides a topological organisation of the surface into 'bumps' and 'dents' from the underlying shading geometry, and provides an invariant linking image gradient flows to surface organisation.

Professor Steven Zucker, Yale University, USA

15:00 - 15:15 Discussion
15:15 - 15:45 Tea
15:45 - 16:15 Colour and illumination in computer vision

In computer vision, illumination is considered to be a problem that needs to be ‘solved’. The colour bias due to illumination is removed to support colour-based image recognition, stable tracking (in and out of shadows) amongst other tasks. In this talk Professor Finlayson will review historical and current algorithms for illumination estimation. In the classical approach, the illuminant colour is estimated by an - ever more sophisticated - analysis of simple image summary statistics. More recently, the full power - and much higher complexity - of deep learning has been deployed (where, effectively, the definition of the image statistics of interest are found as part of the overall optimisation). Professor Finlayson will challenge the orthodoxy of deep learning i.e. that it is the obvious solution to illuminant estimation. Instead he will propose that the estimates made by simple algorithms are biased and this bias can be corrected to deliver leading performance. The key new observation in our method - bias correction has been tried before with limited success - is that the bias must be corrected in an exposure invariant way. 

Professor Graham Finlayson, University of East Anglia, UK

16:15 - 16:45 Resolution of visual ambiguity: interactions in the perception of colour, material and illumination

A key goal of biological image understanding is to extract useful perceptual representations of the external world from the images that are formed on the retina. This is challenging for a variety of reasons, one of which is that multiple configurations of the external world can produce the same retinal image, making the image-understanding problem fundamentally under constrained. None-the-less, biological systems are able to combine constraints provided by the retinal image with those provided by statistical regularities in the natural environment to produce representations that are well correlated with physical properties of the external world. One example of this is provided by our ability to perceive object colour and material properties. These percepts are correlated with object spectral surface and geometric surface reflectance, respectively. But the retinal image formed of an object depends not only on object surface reflectance, but also on the spatial and geometric properties of the illumination. This dependence in turn leading to ambiguity that perceptual processing must resolve. To understand how such processing works, it is necessary to measure how perception is used to identify object colour and material, and how such identification depends on object surface reflectance (both spatial and spectral) as well as on object-extrinsic factors such as the illumination.  Classically, such measurements have been made using indirect techniques, such as matching by adjustment or naming. This talk will introduce a novel measurement method that uses object selection directly, together with a model of underlying perceptual representation, to study the stability of object colour across changes in illumination, as well as how object colour and material trade off in identification.

Professor David H Brainard, University of Pennsylvania, USA

16:45 - 17:00 Discussion

Chair

Dr Nicola Bellotto, University of Lincoln, UK

Chair

Professor Aleš Leonardis, University of Birmingham, UK

09:30 - 10:00 Neural Mechanisms Underlying the Development of Face Recognition

How do brain mechanisms develop from childhood to adulthood leading to better face recognition? There is extensive debate if brain development is due to pruning of excess neurons, synapses, and connections, leading to reduction of responses to irrelevant stimuli, or if development is associated with growth of dendritic arbors, synapses, and myelination leading to increased responses and selectivity to relevant stimuli. Dr Grill-Spector’s research addresses this central debate using cutting edge multimodal imaging in children (ages 5-12) and adults. In her talk, Dr Grill-Spector, will present compelling empirical evidence supporting the growth hypothesis. Anatomically, her recent research has discovered developmental increases in macromolecular tissue volume which is correlated with specific increases in both functional selectivity to faces, as well as improvements in face recognition. Functionally, her results reveal that across childhood development, face-selective regions not only increase in size and selectivity to faces, but also show increases in their neural sensitivity to face identity, in turn improving perceptual discriminability among faces. Finally, Dr Grill-Spector will show how visual experience during development, such as looking behavior, may play a role in sculpting population receptive fields in face-selective regions. Together, these data suggest that both anatomical and functional development play a role in the development of face recognition ability. These results are important as they propose a new model by which emergent brain function and behavior during childhood result from cortical tissue growth rather than from pruning.

Dr Kalanit Grill-Spector, Stanford University, USA

10:00 - 10:30 Recognition in computer vision

Recent advances in visual recognition in computer vision owe significantly to the adoption of deep neural networks trained on large image datasets that have been annotated by human observers. Professor Malik will review state of the art techniques on problems such as object detection, instance segmentation, action recognition and human pose and shape estimation. These techniques have also advanced methods for 3D shape recovery from single images. Professor Malik will also point to areas for future work and ways in which computer vision still falls short of biological vision.

Professor Jitendra Malik, University of California, USA

10:30 - 10:45 Discussion
10:45 - 11:15 Coffee
11:15 - 11:45 Frequency-resolved correlates of visual object recognition revealed by deep convolutional neural networks

Previous work demonstrated a direct correspondence between the hierarchy of the human visual areas and layers of deep convolutional neural networks (DCNN) trained on visual object recognition. In this talk it will be presented how DCNNs are used to investigate which frequency bands carry feature transformations of increasing complexity along the ventral visual pathway. By capitalizing on intracranial depth recordings from 100 patients and 11293 electrodes it was assessed the alignment between the DCNN and signals at different frequency bands in different time windows. It was found that activity in low and high gamma bands was aligned with the increasing complexity of visual feature representations in the DCNN. These findings show that activity in the gamma band is not only a correlate of object recognition, but carries increasingly complex features along the ventral visual pathway. These results demonstrate the potential that modern artificial intelligence algorithms have in advancing our understanding of the brain. Finally, it will be discussed how far can we compare animal and current artificial systems of perception and cognition.

Dr Raul Vicente, University of Tartu, Estonia

11:45 - 12:15 Image understanding beyond object recognition

Computational models of vision have advanced in recent years at a rapid rate, rivaling in some areas human-level performance. Much of the progress to date has focused on analyzing the visual scene at the object level – the recognition and localisation of objects in the scene. Human understanding of images reaches a richer and deeper image understanding both ‘below’ the object level, such as identifying and localizing objects’ part and sub-parts, as well as ‘above’ the object levels, such as identifying object relations, and agents with their actions and interactions. In both cases, understanding depends on recovering meaningful structures in the image, their components, properties, and inter-relations. The talk will describe new directions, based on human and computer vision studies, towards human-like image interpretation, beyond the reach of current schemes, both below the object level, (based on the perception of so-called ‘minimal images’), as well as the level of meaningful configurations of objects, agents and their interactions. In both cases the interpretation process depends on combining ‘bottom-up’ processing, proceeding from the images to high-level cognitive levels, together with ‘top-down’ processing, proceeding from cognitive levels to lower-levels image analysis. 

Professor Shimon Ullman Weizmann Institute of Science, Israel

12:15 - 12:30 Discussion
12:30 - 13:30 Lunch

Chair

Dr Andrew Schofield, University of Birmingham, UK

13:30 - 14:00 What are the computations underlying primate versus machine vision?

Primates excel at object recognition: For decades, the speed and accuracy of their visual system have remained unmatched by computer algorithms. But recent advances in Deep Convolutional Networks (DCNs) have led to vision systems that are starting to rival human decisions. A growing body of work also suggests that this recent surge in accuracy is accompanied by a concomitant improvement in our ability to account for neural data in higher areas of the primate visual cortex. Overall, DCNs have become de facto computational models of visual recognition. This talk will review recent work by the Serre lab bringing into relief limitations of DCNs as computational models of primate vision. Results will be presented showing that visual features learned by DCNs from large-scale object recognition databases differ markedly from those used by human observers during visual recognition. Evidence will be presented suggesting that the depth of visual processing achieved by modern DCN architectures is greater than that achieved by human observers. Finally, it will be shown that DCNs are limited in their ability to solve seemingly simple visual reasoning problems involving similarity and spatial relation judgments suggesting the need for additional neural computations beyond those implemented in modern visual architectures.

Dr Thomas Serre, Brown University, USA

14:00 - 14:30 Learning about shape

Vision is naturally concerned with shape. If we could recover a stable and compact representation of object shape from images, we would hope it might aid with numerous vision tasks. Just the silhouette of an object is a strong cue to its identity, and the silhouette is generated by its 3D shape. In computer vision, many representations have been explored: collections of points, “simple” shapes like ellipsoids or polyhedra, algebraic surfaces and other implicit surfaces, generalised cylinders and ribbons, and piecewise (rational) polynomial representations like NURBS and subdivision surfaces. Many of these can be embedded more or less straightforwardly into probabilistic shape spaces, and recovery (a.k.a. “learning”) of one such space is the goal of the experimental part of this talk. When recovering shape from measurements, there is at first sight a natural hierarchy of stability: straight linBies can represent very little but may be robustly recovered from data, then come conic sections, splines with fixed knots, and general piecewise representations. I will show, however, that one can pass almost immediately to piecewise representations without loss of robustness. In particular, I shall show how a popular representation in computer graphics—subdivision curves and surfaces—may readily be fit to a variety of image data using the technique for ellipse fitting introduced by Gander, Golub, and Strebel in 1994. I show how we can address the previously-difficult problem of recovering 3D shape from multiple silhouettes, and the considerably harder problem which arises when the silhouettes are not from the same object instance, but from members of an object class, for example 30 images of different dolphins each in different poses. 

Dr Andrew Fitzgibbon FREng, Microsoft, UK

14:30 - 14:45 Discussion
14:45 - 15:15 Tea
15:15 - 15:45 The emergence of polychronization and feature binding in a spiking neural network model of the primate ventral visual system

Simulations of a biologically realistic ‘spiking’ neural network model of the primate ventral visual pathway are presented, in which the timings of action potentials are explicitly emulated. It is shown how the higher layers of the network develop regularly repeating spatiotemporal patterns of spiking activity that represent the visual objects on which the network is trained. This phenomenon is known as polychronization. In particular, embedded within the subpopulations of neurons with regularly repeating spike trains, called polychronous neuronal groups, are neurons that represent the hierarchical binding relations between lower level and higher level visual features. Such neurons are termed binding neurons. In the simulations, binding neurons learn to represent the binding relations between visual features across the entire visual field and at every spatial scale. In this way, the emergence of polychronization begins to provide a plausible way forwards to solving the classic binding problem in visual neuroscience, which concerns how the visual system represents the (hierarchical) relations between visual features within a scene. Such binding information is necessary for the visual brain to be able to make semantic sense of its visuospatial world, and may be needed for the future development of artificial general intelligence (AGI) and machine consciousness (MC). Evidence is also provided for the upward projection of visuospatial information at every spatial scale to the higher layers of the network, where it is available for readout by later behavioural systems. This upwards projection of spatial information into the higher layers is referred to as the "holographic principle".

Dr Simon Stringer, University of Oxford, UK

15:45 - 16:15 SpiNNaker: spiking neural networks for computer vision

The SpiNNaker (Spiking Neural Network Architecture) machine is a many-core digital computer optimised for the modelling of large-scale systems of spiking neurons in biological real time. The current machine incorporates half a million ARM processor cores and is capable of supporting models with up to a hundred million neurons and a hundred billion synapses, or about the network scale of a mouse brain, though using greatly simplified neuron models compared with the formidable complexity of the biological neuron cell. One use of the system is to model the biological vision pathway at various levels of detail; another is to build spiking analogues of the Convolutional Neural Networks used for image classification in machine learning and AI applications. In SpiNNaker the equations describing the behaviours of neurons and synapses are defined in software, offering flexible support for novel dynamics and plasticity rules, including structural plasticity. This work is at an early stage, but results are already beginning to emerge that suggest possible mechanisms whereby biological vision systems may learn the statistics of their inputs without supervision and with resilience to noise, pointing the way to engineered vision systems with similar on-line learning capabilities.

Professor Steve Furber CBE FREng FRS, University of Manchester

16:15 - 16:45 Panel discussion