Authors: A. Van Den Broucke1, F. Drakopoulos1, D. Baby, S. Verhulst1
1Ghent University, Technologiepark 126, 9052 Zwijnaarde, Belgium
Background: Although auditory models have progressed to accurately capture the biophysical and nonlinear properties of human hearing, they typically consist of a coupled set of ordinary differential equations (ODEs) in a transmission-line (TL). These biophysically realistic models of the cochlea can capture longitudinal coupling, cochlear nonlinearities, as well as the human frequency selectivity, but are slow to compute and cannot easily be differentiated. As a result, less accurate model descriptions of cochlear processing (e.g., gammatone, DRNL, MFCC) are still the standard for feature extraction, auditory front-ends or hearing-impairment simulation, even though they only provide a very rough approximation of the auditory physiology.
Methods: Here, we combine convolutional neural network (CNN) techniques with computational modelling to yield a real-time model of the human auditory periphery (CoNNear) which benefits from the real-time capabilities of CNNs while preserving the biophysical accurateness of TL models. To this end, a CNN was trained on speech corpus material to mimic the basilar-membrane vibrations of a state-of-the-art biophysical TL model. Since the original TL model has the ability to simulate different degrees of sensorineural hearing loss, we specifically focus on how the training procedure can be optimized to render normal-hearing CNN models hearing impaired.
Results: The performance of various possible CNN architectures and hyperparameters was assessed by comparing the CoNNear simulations against human data and simulations of the original TL model using basic auditory stimuli. We found that CNN architectures can accurately grasp the level-dependent properties of cochlear mechanics while speeding up calculations up to 100 times on a GPU, obtaining real-time latencies (<10 ms). At the same time, the normal-hearing CNN model was easily adjusted using transfer-learning to simulate different outer-hair-cell cochlear gain loss profiles.
Conclusion: The resulting CNN-based cochlear model (CoNNear) allows for real-time, parallel and differentiable computations, which can serve in the next generation back-propagation hearing-aid and machine-hearing applications.
Work supported by European Research Council ERC-StG-678120 (RobSpear)
this is one of the coolest applications of an encoder-decoder architecture i can think of 🙂