Comparing phonemic information transmission with cochlear implants between human listeners and an end-to-end computational model of speech perception

Tim Brochier1, Josef Schlittenlacher2, Iwan Roberts1, Tobias Goehring3, Chen Jiang1,4, Deborah Vickers1, Manohar Bance1

1 Cambridge Hearing Group, Department of Clinical Neurosciences, University of Cambridge, UK; 2 Division of Human Communication, Development, and Hearing, University of Manchester, UK; 3 Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK; 4 Department of Electronic Engineering, Tsinghua University, Beijing, China

Background: Speech perception in CI listeners is affected by the degradation of spectral and temporal information through the CI. Computational models of CI speech perception can be used to rapidly and objectively evaluate strategies that may improve information transmission in CIs. To succeed, these models must make use of similar phonemic cues to CI listeners. Our research aims to replicate phoneme-level CI speech perception patterns using an end-to-end computational model.

Methods: We combined a finite element model of a cochlea, a computational model of the auditory nerve, and an automatic speech recognition neural network (ASR) to generate predictions of CI speech perception. The ASR was trained and tested on neural activation patterns generated by the initial stages of the model, and phonemic information transmission was evaluated. Results were compared to data measured in CI listeners (Donaldson and Kreft, 2006). Consonant features assessed were manner, place, and voicing, and vowel features were the first and second formant, tenseness, and duration. The model was also used to investigate information degradation through the CI signal processing chain.

Results: No significant differences were found between the model and the CI listener data for any consonant or vowel feature. For consonants, manner and voicing cues were transmitted better than place cues. Model predictions and CI listener data for consonant recognition accuracies were correlated (R = 0.641, p = 0.001), suggesting that the model captures between-consonant differences in perceptibility. For vowels, both the model and CI listeners prioritized the first and second formant cues. The bottleneck of information flow occurred at the electrode-neural interface.

Conclusion: A computational model replicated CI speech perception patterns and quantified information degradation through the CI. The model will help to develop, optimize, and predict the efficacy of new CI processing strategies.

Brochier et al
A finite element model of the implanted cochlea, showing the voltage spread caused by stimulation of the implanted electrode.