Opening the Black Box of Binaural Neural Networks

Authors: Alex Tichter1, Marc van Wanrooij2, Jan-Willem Wasmann3, Yagmur Güçlütürk4
Master Artificial Intelligence Internship
2Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University
3Department Otolaryngology, RadboudUMC
4Department of Cognitive Artificial Intelligence, Radboud University

Background: Recently, it has been shown that artificial neural networks are able to mimic the localization abilities of humans under different listening conditions [1]. However, analyses on how the neural network is able to produce the similar outcomes has not been performed yet. While artificial neural networks can often produce good scores on the specified test set, neural networks are also prone to overfit on the training data without the researcher knowing about it [2]. In order to resolve this black box problem of artificial neural networks, we will present analysis methods that investigate the biological plausibility of the listening strategy that the neural network employs.

Methods: We trained 4 binaural neural networks on localizing sound sources in the frontal azimuth semicircle. Our data was synthetically generated by convolving gaussian white noise with HRTFs of the KEMAR head. The resulting frequency arrays were fed into the binaural network and were mapped via a hidden layer with a varying number of hidden nodes (2,20,40,100) to a single output node, indicating the azimuth location of the sound source. First we validated the overall performance with standard localization plots on broadband, highpass and lowpass noise and compared this with human performance. Afterwards, we analyzed the spatial and frequency tuning of the hidden neurons and compared the learned weights to the ILD contours of the HRTFs.

Results: All networks have a target/response Pearson correlation of more than 0.98 for broadband stimuli. But the fewer hidden nodes the network has, the more level dependent the localization performance becomes. Analysis of the weights showed that the 2 hidden neuron model based its predictions on ipsilateral excitation and contralateral inhibition across an HRTF like frequency spectrum (Fig. 1). The spatial tuning of the 2 hidden neuron model is inline with the current theory of ILD processing in mammals [3]. But, the 2 hidden neuron model lacks sharp frequency tuning, which is emerging with a growing number of hidden nodes.

Conclusion: With an increasing number of hidden nodes, the network becomes increasingly sound level independent and has thereby a more accurate localization performance. Additionally, the weight analysis shows that sharp frequency tuning is necessary to extract meaningful ILD information from any input sound. These results show some evidence against the long standing level-meter model and support the sharp frequency tuning found in the LSO of cats.

Figure 1: (Top Left, Light Blue), Overview of the binaural neural network, Red Balls: 1015 frequency bins from the simulated left ear, Blue Balls: 1015 frequency bins form the simulated right ear, Green Background: Colorcoded weights/Frequency Tuning Analysis, Yellow Background: Hidden layer/Spatial Tuning Analysis;
(Top Right, Yellow), Spatial Tuning Analysis, Soundlocation in degree (x-axis) against Hidden Neuron Activity (y-axis), Neuron 1 is coding for sound that is coming from the right side, Neuron 2 is sensitive to sounds coming from the left side.
(Bottom, Green) Frequency tuning for each Neuron, with scaled reference HRTF (green line). Neuron 1 (top), ipsilateral/right ear excitation (light blue) contralateral/left ear inhibition (red). Neuron 2 (bottom), ipsilateral/left ear excitation (violet) contralateral/right ear inhibition (blue).

[1] Sebastian A Ausili. Spatial hearing with electrical stimulation listening with cochlear implants, doctoral thesis, 2019.
[2] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).
[3] Grothe, B., Pecka, M., & McAlpine, D. (2010). Mechanisms of sound localization in mammals. Physiological reviews, 90(3), 983-1012.