The transformation from auditory to linguistic representations across auditory cortex is rapid and attention-dependent
Professor Jonathan Simon, University of Maryland, USA
Professor Simon shows that magnetoencephalography (MEG) responses to continuous speech can be used to directly study lexical as well as acoustic processing. Source localised MEG responses to passages from narrated stories were modelled as linear responses to multiple simultaneous predictor variables, reflecting both acoustic and linguistic properties of the stimuli. Lexical variables were modelled as an impulse at each phoneme, with values based on the phoneme cohort model, including cohort size, phoneme surprisal and cohort entropy.
Results indicate significant left-lateralised effects of phoneme surprisal and cohort entropy. The response to phoneme surprisal, peaking at ~115 ms, arose from auditory cortex, whereas the response reflecting cohort entropy, peaking at ~125 ms, was more ventral, covering the superior temporal sulcus. These short latencies suggest that acoustic information is rapidly used to constrain the word currently being heard. This difference in localisation and timing are consistent with two stages during lexical processing, with phoneme surprisal being a local measure of how informative each phoneme is, and cohort entropy reflecting the state of lexical activation via lexical competition. An additional left-lateralised response to word onsets peaked at ~105 ms.
The effect of selective attention was also investigated using a two speaker mixture, one attended and one ignored. Responses reflect the acoustic properties of both speakers, but reflect lexical processing only for the attended speech. While previous research has shown that responses to semantic properties of words in unattended speech are suppressed, these results indicate that even processing of word forms is restricted to attended speech.
Bottom-up auditory attention using complex soundscapes
Professor Mounya Elhilali, Johns Hopkins University, USA
Recent explorations of task-driven (top-down) attention in the auditory modality draw a picture of a dynamic system where attentional feedback modulates sensory encoding of sounds in the brain to facilitate detection of events of interest and ultimately perception especially in complex soundscapes. Complementing these processes are mechanisms of bottom-up attention that are dictated by acoustic salience of the scene itself but still engage a form of attentional feedback. Often, studies of auditory salience have relied on simplified or well-controlled auditory scenes to shed light on acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of bottom-up auditory attention. Here, Professor Elhilali will explore auditory salience in complex and natural scenes. She will discuss insights from behavioural, neural and computational explorations of bottom-up attention and their implications for our current understanding of auditory attention in the brain.
Speaker-independent auditory attention decoding without access to clean speech sources
Professor Nima Mesgarani, Columbia University, USA
Speech perception in crowded acoustic environments is particularly challenging for hearing impaired listeners. Assistive hearing devices can suppress background noises that are sufficiently different from speech; however, they cannot lower interfering speakers without knowing the speaker on which the listener is focusing. One possible solution to determine the listener’s focus is auditory attention decoding in which the brainwaves of listeners are compared with sound sources in an acoustic scene to determine the attended source, which can then be amplified to facilitate hearing. In this talk, Professor Mesgarani addresses a major obstacle in actualising this system, which is the lack of access to clean sound sources in realistic situations where only mixed audio is available. He proposes a novel speech separation algorithm to automatically separate speakers in mixed audio without any need for prior training on the speakers. The separated speakers are compared to evoked neural responses in the auditory cortex of the listener to determine and amplify the attended speaker. These results show that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. Moreover, Professor Mesgarani demonstrates that the proposed method significantly improves both the subjective and objective quality of the attended speaker. By combining the latest advances in speech processing technologies and brain-computer interfaces, this study addresses a major obstacle in actualisation of auditory attention decoding that can assist individuals with hearing impairment and reduce the listening effort for normal hearing subjects in adverse acoustic environments.
On the encoding and decoding of natural auditory stimulus processing using EEG
Professor Ed Lalor, University of Rochester, USA and Trinity College Dublin, Ireland
Over the past few years there has been a surge in efforts to model neurophysiological responses to natural sounds. This has included a variety of methods for decoding brain signals to say something about how a person is engaging with and perceiving the auditory world. In this talk Professor Lalor will discuss recent efforts to improve these decoding approaches and broaden their utility. In particular, he will focus on three related factors: 1) how we represent the sound stimulus, 2) what features of the data we focus on, and 3) how we model the relationship between stimulus and response. Professor Lalor will present data from several recent studies in which he has used different stimulus representations, different EEG features and different modelling approaches in an attempt to lead to more useful decoding models and more interpretable encoding models of brain responses to sound.