Emma Holmes1, Thomas Parr1, Timothy D Griffiths1,2, Karl J Friston1
1 Wellcome Centre for Human Neuroimaging, UCL, London, UK;Â 2 Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
Background: Selective attention enables listeners to focus on a talker among a mixture of talkers. Yet, it does not appear to be all-or-none: when an attention-directing cue is presented, preparatory attention builds up over time. For example, reaction times progressively improve as an instructional cue is presented longer in advance of the target talker. Also, EEG activity increases in amplitude before the target talker speaks. The computational processes underlying this slow induction of attentional set are not fully understood.
Methods: Here, we take a theoretical stance, based on active inference (Friston et al., 2017). We introduce a new generative model of selective attention during cocktail party listening, and treat selective attention as an inference problem. We model a simple paradigm in which two spatially-separated talkers each speak a different colour and number word. A visual cue directs attention to the left or right talker, and the task is to identify the words spoken by the cued talker. We used this model to test competing hypotheses about how time-sensitive changes in precision affect simulated reaction times and EEG responses.
Results: Temporal changes in precision were unnecessary to explain the improvement in reaction times with longer cue-target intervals, but were needed to explain the increase in EEG responses before the target talker speaks. An exponential-shaped increase in precision fit the EEG data better than alternative functional forms tested.
Conclusion: Overall, this work contributes to our understanding of selective attention. The model generates quantitative (testable) predictions about behavioural, psychophysical and electrophysiological responses, and underlying changes in synaptic efficacy.
In this work, the inputs were discrete words, rather than the continuous acoustic signal – but here’s a link to the paper I mentioned in the Q&A about inferring words from a continuous acoustic signal: https://doi.org/10.1016/j.heares.2020.107998 – which could easily be combined with the attention model in a hierarchical model if desired