This page is archived

Links to external sources may no longer work as intended. The content may not represent the latest thinking in this area or the Society’s current position on the topic.

Attention to sound

14 - 15 November 2018 09:30 - 17:30

Theo Murphy international scientific meeting organised by Dr Alain de Cheveigné, Professor Maria Chait and Dr Malcolm Slaney.

Some sounds are safe to ignore, others require attention. New paradigms and analysis techniques are emerging that enhance our understanding of how the auditory brain makes this choice, and pave the way for novel applications such as the cognitive control of a hearing aid. We gathered neuroscientists, experts in brain signal encoding, and people involved in developing and marketing devices.

The schedule of talks and speaker abstracts and biographies are below. Recorded audio of the presentations is also available below.

Attendance of invited discussants supported by H2020 project COCOHA.

Confirmed invited discussants include:

  • Dr Aurélie Bidet-Caulet, Lyon Neuroscience Research Centre, France
  • Dr Jennifer Bizley, University College London, UK
  • Dr Gregory Ciccarelli, Massachusetts Institute of Technology Lincoln Laboratory, USA
  • Professor Maarten De Vos, University of Oxford, UK
  • Professor Fred Dick, Birkbeck University of London and University College London, UK
  • Professor Tom Francart, KU Leuven, Belgium
  • Dr Jens Hjortkjær, Technical University of Denmark, Denmark
  • Dr Christophe Micheyl, Starkey France, Lyon Neuroscience Research Center and Ecole Normale Supérieure, France
  • Professor Lucas Parra, The City College of New York, USA
  • Dr Tobias Reichenbach, Imperial College London, UK

Attending this event

This event has taken place.

Enquiries: contact the Scientific Programmes team

Organisers

  • Dr Alain de Cheveigné, Ecole normale supérieure, CNRS, France and University College London, UK

    Alain de Cheveigné was trained in Mathematics and Physics at Paris 6 University, and is currently a Senior Scientist with the CNRS, affiliated with the Ecole normale supérieure in Paris, and University College London. He is active in speech and hearing science, with an interest in pitch perception, sound segregation, audio signal processing, and brain data analysis. He currently coordinates the European H2020 project Cognitive Control of a Hearing Aid (COCOHA).

  • Professor Maria Chait, University College London, UK

    Maria Chait is a Professor of auditory cognitive neuroscience at the Ear Institute, University College London. Professor Chait moved to UCL in 2007, as a Marie Curie research fellow, following a short post-doc at Ecole normale supérieure, Paris. Professor Chait's PhD research (2006) was conducted at the Neuroscience and Cognitive Science program, University of Maryland College Park, USA under the supervision of Jonathan Simon and David Poeppel. Her undergraduate background is in Computer Science, Economics, and East Asian Studies.

  • Dr Malcolm Slaney, Google AI Machine Hearing, USA

    Dr Malcolm Slaney is a research scientist in the AI Machine Hearing Group at Google. He is a Adjunct Professor at Stanford CCRMA, where he has led the Hearing Seminar for more than 20 years, and an Affiliate Faculty in the Electrical Engineering Department at the University of Washington. He has served as an Associate Editor of IEEE Transactions on Audio, Speech and Signal Processing and IEEE Multimedia Magazine. He has given successful tutorials at ICASSP 1996 and 2009 on 'Applications of Psychoacoustics to Signal Processing', on 'Multimedia Information Retrieval' at SIGIR and ICASSP, and 'Web-Scale Multimedia Data' at ACM Multimedia 2010. He is a coauthor, with A C Kak, of the IEEE book Principles of “Computerized Tomographic Imaging”. This book was republished by SIAM in their 'Classics in Applied Mathematics' Series. He is coeditor, with Steven Greenberg, of the book Computational Models of Auditory Function. Before joining Google, Dr Slaney has worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research, IBM’s Almaden Research Center, Yahoo! Research, and Microsoft Research. For many years, he has lead the auditory group at the Telluride Neuromorphic (Cognition) Workshop. Dr Slaney’s recent work is on understanding attention and general audio perception. He is a Senior Member of the ACM and a Fellow of the IEEE.

Schedule

Chair

Dr Gregory Ciccarelli, Massachusetts Institute of Technology Lincoln Laboratory, USA

09:30 - 09:50 Introduction
09:50 - 10:10 Towards intention controlled hearing aids: experiences from eye-controlled hearing aids

A hearing impairment causes a reduced ability to segregate acoustic sources. This gives problems in switching between and following speech streams in complex scenes with multiple talkers. Current hearing aid beamforming technologies rely on a listener’s ability to point with the head towards a source of interest. However, this is very difficult in a conversation situation with spatially separated talkers where rapid switches between talkers takes place. In this talk Professor Lunner will show that eye-gaze position signals can be picked up electrically in the ear canal through electrooculography, and that these signals can be used for fast intentional eye-gaze control towards the source of interest in a complex listening scene like a restaurant. Experiments where eye-gaze signals are combined with motion sensors and beamformers show that high benefits in form of improved speech intelligibility is possible for the hearing-impaired listeners. Results also indicate that eye-control combined with head movements is faster and more precise than head movements alone. The presentation will include several videos to show the use cases.

Professor Thomas Lunner, Eriksholm Research Centre, Denmark and Linköping University, Sweden

10:10 - 10:30 Discussion
10:30 - 11:00 Coffee
11:00 - 11:20 Path to product: cost, benefits and risks

For the hearing impaired listener, current hearing aid technology largely fails with the 'cocktail party problem'. In dynamic, conversational turn-taking, the intent of the listener determines the target talker. Using EEG, the measured focus of attention can be used as a proxy for intent. Other potential applications of EEG include measurement of listening effort and speech comprehension, and automated  hearing aid fitting.

A primary challenge for the productisation of EEG based systems is the development of an appropriate electrophysiological front-end and include: (1) sensor arrays that provides for a commercially acceptable industrial design; (2) electrodes that don’t require conductive paste but overcome the problems of electrical noise and movement artefact; and (3) platform constraints such limited power, ultra-low current implementation and A2D architectures.

On-line EEG analysis is computationally expensive and requires middle tier or cloud-based platforms which are only appropriate for applications that are not time sensitive (<10 milliseconds). The analysis time window needs to be short and current Bluetooth communication protocols produce 100s milliseconds latency. Cloud computation produces additional delays dependent on the availability and quality of the cellular network.

Such implementation challenges are not trivial and will require a concerted investment of resources. This approach however, has the potential to solve the principal and refractory failing of current hearing aid technology as well as enable the development of adaptive devices to much better fit individual needs.

Dr Simon Carlile, Starkey Hearing Technologies, USA

11:20 - 11:40 Discussion
11:40 - 12:00 The need for auditory attention

Understanding attention is key to many auditory tasks. In this talk Dr Slaney would like to summarise several aspects of attention that have been used to better understand how humans use attention in our daily lives. This work extends from top-down and bottom-up models of attention that are useful for solving the cocktail party problem, to the use of eye-gaze and face-pose information to better understand speech in human-machine and human-human-machine interactions. The common thread throughout all this work is the use of implicit signals such as auditory saliency, face pose and eye gaze as part of a speech-processing system. Dr Slaney will show algorithms and results from speech recognition, speech understanding, addressee detection, and selecting the desired speech from a complicated auditory environment. All of this is grounded in models of auditory attention and saliency.

Dr Malcolm Slaney, Google AI Machine Hearing, USA

12:00 - 12:20 Discussion
12:20 - 12:40 General discussion

Chair

Professor Torsten Dau, Technical University of Denmark, Denmark

14:00 - 14:20 The transformation from auditory to linguistic representations across auditory cortex is rapid and attention-dependent

Professor Simon shows that magnetoencephalography (MEG) responses to continuous speech can be used to directly study lexical as well as acoustic processing. Source localised MEG responses to passages from narrated stories were modelled as linear responses to multiple simultaneous predictor variables, reflecting both acoustic and linguistic properties of the stimuli. Lexical variables were modelled as an impulse at each phoneme, with values based on the phoneme cohort model, including cohort size, phoneme surprisal and cohort entropy.

Results indicate significant left-lateralised effects of phoneme surprisal and cohort entropy. The response to phoneme surprisal, peaking at ~115 ms, arose from auditory cortex, whereas the response reflecting cohort entropy, peaking at ~125 ms, was more ventral, covering the superior temporal sulcus. These short latencies suggest that acoustic information is rapidly used to constrain the word currently being heard. This difference in localisation and timing are consistent with two stages during lexical processing, with phoneme surprisal being a local measure of how informative each phoneme is, and cohort entropy reflecting the state of lexical activation via lexical competition. An additional left-lateralised response to word onsets peaked at ~105 ms.

The effect of selective attention was also investigated using a two speaker mixture, one attended and one ignored. Responses reflect the acoustic properties of both speakers, but reflect lexical processing only for the attended speech. While previous research has shown that responses to semantic properties of words in unattended speech are suppressed, these results indicate that even processing of word forms is restricted to attended speech.

Professor Jonathan Simon, University of Maryland, USA

14:20 - 14:40 Discussion
14:40 - 15:00 Bottom-up auditory attention using complex soundscapes

Recent explorations of task-driven (top-down) attention in the auditory modality draw a picture of a dynamic system where attentional feedback modulates sensory encoding of sounds in the brain to facilitate detection of events of interest and ultimately perception especially in complex soundscapes. Complementing these processes are mechanisms of bottom-up attention that are dictated by acoustic salience of the scene itself but still engage a form of attentional feedback. Often, studies of auditory salience have relied on simplified or well-controlled auditory scenes to shed light on acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of bottom-up auditory attention. Here, Professor Elhilali will explore auditory salience in complex and natural scenes. She will discuss insights from behavioural, neural and computational explorations of bottom-up attention and their implications for our current understanding of auditory attention in the brain.

Professor Mounya Elhilali, Johns Hopkins University, USA

15:00 - 15:20 Discussion
15:20 - 16:00 Coffee
16:00 - 16:20 Speaker-independent auditory attention decoding without access to clean speech sources

Speech perception in crowded acoustic environments is particularly challenging for hearing impaired listeners. Assistive hearing devices can suppress background noises that are sufficiently different from speech; however, they cannot lower interfering speakers without knowing the speaker on which the listener is focusing. One possible solution to determine the listener’s focus is auditory attention decoding in which the brainwaves of listeners are compared with sound sources in an acoustic scene to determine the attended source, which can then be amplified to facilitate hearing. In this talk, Professor Mesgarani addresses a major obstacle in actualising this system, which is the lack of access to clean sound sources in realistic situations where only mixed audio is available. He proposes a novel speech separation algorithm to automatically separate speakers in mixed audio without any need for prior training on the speakers. The separated speakers are compared to evoked neural responses in the auditory cortex of the listener to determine and amplify the attended speaker. These results show that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. Moreover, Professor Mesgarani demonstrates that the proposed method significantly improves both the subjective and objective quality of the attended speaker. By combining the latest advances in speech processing technologies and brain-computer interfaces, this study addresses a major obstacle in actualisation of auditory attention decoding that can assist individuals with hearing impairment and reduce the listening effort for normal hearing subjects in adverse acoustic environments.

Professor Nima Mesgarani, Columbia University, USA

16:20 - 16:40 Discussion
16:40 - 17:00 On the encoding and decoding of natural auditory stimulus processing using EEG

Over the past few years there has been a surge in efforts to model neurophysiological responses to natural sounds. This has included a variety of methods for decoding brain signals to say something about how a person is engaging with and perceiving the auditory world. In this talk Professor Lalor will discuss recent efforts to improve these decoding approaches and broaden their utility. In particular, he will focus on three related factors: 1) how we represent the sound stimulus, 2) what features of the data we focus on, and 3) how we model the relationship between stimulus and response. Professor Lalor will present data from several recent studies in which he has used different stimulus representations, different EEG features and different modelling approaches in an attempt to lead to more useful decoding models and more interpretable encoding models of brain responses to sound.

Edmund Lalor  TCD

Professor Ed Lalor, University of Rochester, USA and Trinity College Dublin, Ireland

17:00 - 17:20 Discussion
17:20 - 17:40 General discussion

Chair

Professor Adrian KC Lee, University of Washington, USA

08:30 - 08:50 Facilitation and inhibition in visual selective attention

Visual selective attention is thought to facilitate performance both through enhancement and inhibition of sensory processing of goal-relevant and irrelevant (or distracting) information. While much insight has been gained over the past few decades into the neural mechanisms underlying facilitatory effects of attention, much less is known about inhibitory mechanisms in visual attention. In particular, it is still unclear as to whether target facilitation and distractor inhibition are simply different sides of the same coin or whether they are controlled by distinct neural mechanisms. Moreover, recent work indicates that suppression of visual distractors only emerges when information about the distractor can be derived directly from experience, consistent with a predictive coding model of expectation suppression. This also raises the question as to how visual attention and expectation interact to bias information processing. In this talk, Professor Slagter will discuss recent findings from several behavioural and EEG studies that examined how expectations about upcoming target or distractor locations and/or features influence facilitatory and inhibitory effects of attention on visual information processing and representation using ERPs, multivariate decoding analyses, and inverted encoding models. Collectively, these confirm an important role for alpha oscillatory activity in town-down biasing of visual attention to, and sharpening of representations of target locations. Yet, they also show that target facilitation and distractor suppression are differentially influenced by expectation, and rely at least in part on different neural mechanisms, with distractor suppression selectively occurring after stimulus presentation. This latter finding raises the question as to whether voluntary preparatory inhibition is possible at all.

Professor Heleen Slagter, University of Amsterdam, The Netherlands

08:50 - 09:10 Discussion
09:10 - 09:30 Rhythmic structures in visual attention: behavioural and neural evidence

In a crowded visual scene, attention must be efficiently and flexibly distributed over time and space to accommodate different task contexts. In this talk, Professor Luo would like to present several works in the lab investigating the temporal structure of visual attention. First, by using a time-resolved behavioral measurement, the group demonstrates that attentional behavioural performance contains temporal fluctuations (theta-band, alpha-band, etc), supporting that neuronal oscillatory profile might be directly revealed at behavioural level. These behavioural oscillations display a temporal alternating relationship between locations, suggesting that attention samples multiple items in a time-based rhythmic manner. Second, by employing EEG recordings in combination with a TRF approach, the group extracted object-specific neuronal impulse responses during multi-object selective attention. The results show that attention rhythmically switches among visual objects every ~200 ms, and the spatiotemporal sampling profile adaptively changes in various task contexts. Finally, by using MEG recordings in combination with a decoding approach, the group demonstrates that attention fluctuates between attended orientation features in a theta-band rhythm, suggesting that feature-based attention is mediated by rhythmic sampling similar to that for spatial attention. In summary, attention is not stationary but dynamically samples multiple visual objects in a periodic or serial-like way. This work advocates a generally central role of temporal organisation in attention by flexibly and efficiently organising resources in time dimension.

Professor Huan Luo, Peking University, China

09:30 - 09:50 Discussion
09:50 - 10:20 Coffee
10:20 - 10:40 Attention across sound and vision: effects of perceptual load

Load Theory of attention and cognitive control offers a hybrid model that combines capacity limits in perception with automaticity of processing. The model proposes that perception has limited capacity but proceeds automatically and involuntarily in parallel on all stimuli within capacity: relevant as well as irrelevant. Much evidence accumulated to support load theory in vision research so far. However the cross modal effects of perceptual load across the senses are less clear. In her talk, Professor Lavie will present recent work on the effects of visual perceptual load on auditory perception and the related neural activity as assessed with magnetoencephalography. The results showed that the level of unattended auditory perception and the related neural signal critically depends on the level of perceptual processing load in the visual attention task. Task conditions of high perceptual load that takes up all capacity with attended task processing, lead to reduced processing of unattended stimuli. In contrast in conditions of low perceptual load that leave spare capacity ignored task-irrelevant stimuli are nevertheless perceived, and elicit neural response. These findings demonstrate the value of understanding the role of attention in auditory processing within the framework of Load Theory.

Professor Nilli Lavie FBA, University College London, UK

10:40 - 11:00 Discussion
11:00 - 11:20 Networks controlling attention in vision (and audition)

Neuroimaging with fMRI shows that there are distinct networks biased towards the processing of visual and auditory information. These networks include inter-digitated areas in frontal cortex as well as corresponding primary and secondary sensory regions. In Professor Shinn-Cunningham's studies, she sees these distinct frontal regions consistently in individual subjects across multiple studies spanning years; however, the inter-digitated structural organization of the 'visual' and 'auditory' regions in frontal cortex is not clear using standard methods for co-registering and averaging fMRI results across subjects. Although the networks that include these inter-digitated frontal control regions are 'sensory biased', they are also recruited to process information in the other sensory modality as needed. Specifically, areas that are always engaged by auditory attention are recruited when visual tasks require processing of temporal structure, but not when the same visual inputs are accessed for tasks requiring processing of spatial information. Conversely, processing of auditory spatial information preferentially engages the visually biased brain network – a network that is traditionally associated with spatial visual attention. This visuo-spatial network includes retinotopically organised spatial maps in parietal cortex. Recent EEG results from Professor Shinn-Cunningham's lab confirm that auditory spatial attention makes use of the parietal maps in the 'visual spatial attention' network. Together, these results reveal that visual networks for attention are a shared resource used by the auditory system.

Professor Barbara Shinn-Cunningham, Carnegie Mellon University, USA

11:20 - 11:40 Discussion
11:40 - 12:00 General discussion

Chair

Professor Stephen David,Oregon Health & Science University, USA

13:30 - 13:50 Objective, reliable, and valid? Measuring auditory attention

Auditory attention is a fascinating feat. For example, it is most astonishing how our brain 'does away' with considerable differences in sound pressure between a behaviourally relevant sound source and other interferences. Meanwhile, auditory attention has remained this elusive phenomenon: do we really understand enough just yet of auditory attention to build machines that attend, or machines that help us attend? Illustrated by behavioural, electrophysiological, and functional imaging data from his own lab and others, Professor Obleser will take stock of the evidence: are top-down selective-attention abilities indeed a stable, trait-like feature of the individual listener, with predictable decline in older adults? And, what are we really getting from our current go-to neural measures of auditory attention, speech tracking aka 'neural entrainment' versus alpha-power fluctuations? Luckily, Professor Obleser will probably be out of time as the talk reaches the main question: what are we measuring when we measure auditory attention?

Professor Jonas Obleser, University of Lübeck, Germany

13:50 - 14:10 Discussion
14:10 - 14:30 Auditory selective attention: lessons from distracting sounds

A fundamental assumption in attention research is that, since processing resources are limited, the core function of attention is to manage these resources and allocate them among concurrent stimuli or tasks, according to current behavioural goals and environmental needs. However, despite decades of research, we still do not have a full characterisation of the nature these processing limitations, or ‘bottlenecks’ – ie what processes can be in performed in parallel and where the need for attentional selection kicks in. This question is particularly pertinent in the auditory system, which has been studied far less extensively than the visual system, and is proposed to have a wider capacity for parallel processing of incoming stimuli.

In this talk Dr Golumbic will discuss a series of experiments studying the depth of processing applied to task-irrelevant sounds and their neural encoding in auditory cortex. She will look at how this is affected by the acoustic properties, temporal structure, and linguistic structure of unattended sounds, as well as by overall acoustic load and task demands, in attempt to understand what levels suffer most from processing bottlenecks. In addition, she will discuss what we can learn about the capacity of parallel processing of auditory stimuli from pushing the system to its limits and requiring the division of attention among multiple concurrent inputs.

Dr Elana Golumbic, Bar Ilan University, Israel

14:30 - 14:50 Discussion
14:50 - 15:30 Coffee
15:30 - 15:50 The neuro-computational architecture of auditory attention

Auditory attention is a crucial component of real-life listening and is required, for instance, to enhance a particularly relevant aspect of a sound or to separate a sound of interest from noisy backgrounds. When listening to simple tones, attending to a certain frequency range induces a rapid and specific adaptation of neuronal tuning, which ultimately results in enhanced processing of that frequency range and suppression of the other frequencies. But which are the neural mechanisms enabling attentive selection and enhancement when listening to complex real-life sounds and scenes? At which levels of neural sound representation does attention operate? And how do these mechanisms depend on the specific behavioural requirements? High-resolution fMRI and computational modelling of sound representations both provide a relevant contribution to address these questions. Sub-millimetre fMRI enables distinguishing the activity and connectivity of neuronal populations across cortical layers non-invasively in humans (laminar fMRI). This is required for disentangling feedforward/feedback processing in primary and non-primary auditory areas and the communication between auditory and other areas (eg frontal areas). Modelling of sound representations allows formulating well-defined hypotheses on the nature of simple and complex features processed in the network of auditory areas and how the neural sensitivity for these features is affected by attention and behavioural task demands. The combination of laminar fMRI and sound representation models is thus ideally positioned to unravel the neural circuitry and the computational architecture of auditory attention in naturalistic listening scenarios.

Professor Elia Formisano, Maastricht University, The Netherlands

15:50 - 16:10 Discussion
16:10 - 16:30 How attention modulates processing of mildly degraded speech to influence perception and memory

Professor Johnsrude and colleagues have previously demonstrated that, whereas the pattern of brain (fMRI) activity elicited by clearly spoken sentences does not seem to depend on attention, patterns are markedly different when attending or not to highly intelligible but degraded (6-band noise vocoded) sentences (Wild et al, J Neurosci, 2012). They have replicated and extended this work to sentences that, although slightly degraded (12-band noise vocoded), can be reported word-for-word with 100% accuracy. Even for these very intelligible materials, a marked dissociatation was observed in patterns of brain activity when people attended to these compared to when they were performing a multiple object tracking task. Furthermore, in both of these experiments, memory for degraded items was enhanced by attention, whereas memory for clear sentences was not, suggesting that even perfectly intelligible but degraded sentences are processed in a qualitatively different, attentionally gated, way, compared to clear sentences. Supported by a Canadian Institutes of Health Research operating grant (MOP 133450) and Canadian Natural Sciences and Engineering Research Council Discovery grant (3274292012).

Professor Ingrid Johnsrude, Western University, Canada

16:30 - 16:50 Discussion
16:50 - 17:10 General discussion
17:10 - 17:30 Concluding session