Authors: Samuel Smith1, Mark Wallace1, Ananthakrishna Chintanpalli2, Michael Akeroyd1, Michael Heinz3, Christian Sumner4
1Univ. of Nottingham, Nottingham, United Kingdom;
2Birla Inst. of Technol. & Sci., Rajasthan, India;
3Purdue Univ., West Lafayette, IN; 4Nottingham Trent Univ., Nottingham, United Kingdom
Background: Humans are able to identify a conversation partner’s speech after it has been obscured by an interfering talker. A common conceptual model of this is that the auditory system performs speech segregation, utilising physical ‘bottom-up’ cues (e.g. pitch, onset asynchronies, glimpsing) prior to recognition. We instead question whether ‘top-down’ recognition better describes perception. Do listeners appear to be predicting and recognising the combined neural representation of overlapping speech?
Methods: 3 speech identification paradigms were explored: concurrently presented vowels with differing pitches, overlapping syllables with varying onsets times, syllables amongst a variety of clean and modified sentence segments. For each, neural responses were recorded from the midbrain of anaesthetised guinea pigs. A naïve Bayes classifier was trained to identify neural responses to combinations of speech sounds. Machine recognition was then compared with the performance of human listeners.
Results: The neural classifier accurately predicted human recognition of overlapping speech sounds. Improved identification of concurrent vowels with increasing pitch differences was quantitatively predicted. Improved identification of syllables as a function of temporal onset lag was quantitatively predicted. Improved identification of syllables amongst sentence segments with increasing spectro-temporal glimpses was quantitatively predicted. Further the probabilistic classifier was able to predict listeners’ specific micro-decisions.
Conclusions: Machine recognition of overlapping speech encoded in the midbrain mimicked human perception. Advantages of basic auditory cues emerged from a general prediction driven strategy which had no explicit knowledge of such auditory cues.