Authors: Clément Gaultier1*; Tobias Goehring1
1University of Cambridge
Background Noise and reverberation can impair speech perception for cochlear implant (CI) recipients. Significant improvements in speech-intelligibility for CI recipients in noisy situations were reported with deep-neural-network (DNN) algorithms. With improvements in end-to-end learning techniques and multi-microphone approaches, powerful speech enhancement (SE) strategies can now be developed to cope with more realistic difficult situations with both noise and reverberation.
Method Three algorithms were developed to jointly alleviate noise and reverberation. The algorithms were trained on simulated sound scenes using either single- or multi-microphone recordings from behind-the-ear devices. Performance was assessed using objective measures and a listening test on reverberant mixtures of speech in babble noise at several Signal-To-Noise Ratios (SNRs). We evaluated Speech-Reception-Thresholds (SRTs) for the 3 algorithms. Both CI listeners and participants using CI simulations took part in the study.
Results All approaches showed improved objective scores over the unprocessed condition. Metrics indicated better signal to distortion ratios and predicted intelligibility for the multi-microphone than for the single-microphone approach, especially in the noisiest situations. Experimental results from participants with typical hearing using CI simulations (n=15, reported on Figure 1) showed significant improvements of up to 7 dB in SRTs between the multi-microphone approaches over both the unprocessed and single-microphone cases.
Conclusion Multi-microphone SE algorithms based on deep learning showed strong potential to improve speech intelligibility in realistic situations with babble noise and reverberation for cochlear implant listeners. This was the case even when restricting the processing to unilateral microphones. Further work should investigate the effect of such strategies on preserving auditory awareness of the environment whilst enhancing speech.