Design of an End-to-End Deep Learning Sound Coding Strategy for Cochlear Implants and Validation through an Objective Metric based on Mutual Information

Authors: Tom Gajecki1, Franklin Alvarez1, Waldo Nogueira2

1Medizinische Hochschule Hannover
2Medical University Hannover, Cluster of Excellence Hearing4all

A cochlear implant (CI) is a surgically implanted medical device designed to restore hearing in individuals with deafness by converting sounds into electrical pulses that stimulate the auditory nerve, creating the perception of sound. While CI users generally have good speech comprehension in quiet settings, background noise significantly complicates this ability. To address this issue, speech enhancement algorithms preprocess speech signals before their processing by the CI. However, these techniques can introduce processing delays that may adversely affect the user’s auditory experience.

In this work, we explore Deep ACE, an end-to-end speech-denoising system built on a deep neural network. Deep ACE aims to supplant the traditional advanced combination encoder (ACE) sound coding strategy of CIs, effectively removing background noise without adding additional delays. We use the spike activity mutual information index (SAMII) to objectively evaluate the speech enhancement capabilities of Deep ACE. SAMII is an objective metric of speech intelligibility that uses a physiological model of the human peripheral auditory system to compute mutual information between neural activity in ideal and noisy conditions. A vocoder, integrated into the front end of SAMII (vSAMII), is used to emulate electrical hearing.

The German HSM sentence test was used to compare Deep ACE’s performance with another front-end speech enhancement algorithm, TasNet. We also conducted perceptual tests with real CI users to validate the objective findings. The speech stimuli were blended with speech-shaped noise and babble noise at 0 dB and 5 dB signal-to-noise ratio. The findings from both vSAMII and the perceptual tests revealed a significant improvement in speech perception with the enhancement algorithms (Deep ACE or TasNet) over traditional ACE, with Deep ACE demonstrating better results in perceptual tests than TasNet, although vSAMII showed no significant differences in scores.