Using auditory texture statistics for domain-neutral removal of background sounds

Authors: Artoghrul Alishbayli1, Noah Schlegel1, Bernhard Englitz1

1Radboud University

Background. Human communication often occurs in the presence of interfering background noise, which can significantly affect speech perception. Recent research has highlighted the importance of auditory textures in the recognition and suppression of such interfering noise.

Methods. Here, we propose a fast, domain-free noise suppression method exploiting the stationarity and spectral similarity of sound sources that make up sound textures, termed Statistical Sound Filtering (SSF). SSF represents a library of spectrotemporal features of the background noise and compares this against instants in speech-noise-mixtures to subtract contributions that are statistically consistent with the interfering noise. We evaluated the performance of SSF using multiple quality measures and human listeners on the TIMIT corpus of speech utterances.

Results. Our results demonstrate that SSF significantly improves the sound quality across all performance metrics, capturing different aspects of the sound. Additionally, human participants reported reduced background noise levels as a result of filtering, without any significant damage to speech quality. SSF executes rapidly (~100x real-time) and can be retrained rapidly and continuously in changing acoustic contexts.

Conclusions. Our proposed method, SSF, is a fast and effective approach for noise suppression in adverse acoustical conditions. SSF is suitable for integration into hearing aids where power-efficient, fast and adaptive training and execution are critical, providing a promising avenue for improving speech perception in noisy environments.