Use of a deep recurrent neural network to reduce transient noise: Effects on subjective speech intelligibility and comfort

Mahmoud Keshavarzi1,2,3, Tobias Reichenbach3,4, Brian C. J. Moore2

1 Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK; 2 Cambridge Hearing Group, Department of Psychology, University of Cambridge, Cambridge, UK; 3 Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, UK; 4 Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nuremberg, 91056 Erlangen, Germany

Background: Hearing aid users often complain about discomfort and reduced speech intelligibility caused by transient sounds such as a knife hitting a plate. Here, a deep recurrent neural network (RNN) for reducing transient sounds was developed and its effects on subjective speech intelligibility and listening comfort were evaluated. The RNN was trained using many sentences spoken with different accents and corrupted by different transient sounds. It was then tested using sentences spoken by unseen speakers and corrupted by unseen transient sounds.

Methods: A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective speech intelligibility and listening comfort for two relative levels of the transients. The conditions were processing using the RNN, processing using a multi-channel transient reduction method (MCTR, [1]), and no processing (NP). Ten native English-speaking participants with normal hearing and ten with mild-to-moderate hearing loss were tested.

Results & Conclusion: For the normal-hearing participants, processing using the RNN was significantly preferred over that for NP for both subjective intelligibility and comfort, processing using the RNN was significantly preferred over that for MCTR for comfort, and processing using the MCTR was significantly preferred over that for NP for comfort but only for the higher transient level. For the hearing-impaired subjects, processing using the RNN was significantly preferred over that for NP for subjective intelligibility and comfort, processing using the RNN was significantly preferred over that for MCTR for comfort, and processing using the MCTR was significantly preferred over that for NP for comfort. Overall, the results indicate that the RNN was more effective than the MCTR.

 [1] Keshavarzi, M., Baer, T. & Moore, B. C. J. (2018) Evaluation of a multi-channel algorithm for reducing transient sounds. Int J Audiol, 57, 624-631.

 This work was supported by the RNID (UK, Flexi Grant No. 99).

Keshavarzi et al