Authors: Sina Tahmasebi1, Tom Gajęcki1 and Waldo Nogueira1
1Department of Otolaryngology, Medical University Hannover and Cluster of Excellence ”Hearing4all”, Hannover, Germany
Background: Listening to music is an important and enjoyable part of many people’s lives that can improve the quality of life through its recreational and rehabilitative function. However, cochlear implant (CI) users face severe limitations in the context of music perception and appreciation. It has been shown that enhancing the singing voice of music improves music appreciation for CI users [e.g. Buyens et al. 2014; Pons et al. 2016]. For this purpose, different signal processing algorithms have been proposed to make music more accessible for CI users [Nogueira et al. 2019], aiming at reducing the complexity of music signals [Nagathil et al. 2017] or remixing them to enhance certain components, such as the lead singing voice [Tahmasebi et al. 2020; Gajecki et al. 2018]. Non-negative matrix factorization, multi-layer perceptrons (MLP), deep recurrent neural networks, and convolutional autoencoders are some advanced approaches that have been used to separate different sources within an audio mixture [Pons et al. 2016; Gajecki et al. 2018].
Method: This work presents a deep neural network that performs real-time audio source separation to remix music for CI users. The implementation is based on a multilayer perceptron (MLP) and has been evaluated using objective instrumental measurements to ensure source estimation with distortions below the just-noticeable threshold for CI users. Following source separation, the algorithm remixes the level between the singing voice and the background instruments with a single parameter termed the vocals to instruments ratio (VIR) in decibels (dB) [Audio examples]. Experiments in 10 normal hearing (NH) and 13 CI users have been conducted to investigate how the VIR is adjusted CI users and a control group of NH listeners in realistic and non-realistic acoustic environments as well as providing or not visual information.
Results: The objective instrumental results fulfill the benchmark reported in previous studies [Pons et al. 2016; Gajecki et al. 2018] by introducing distortions that are not perceived by CI users. The experimental results show that CI users prefer a VIR of 8 dB enhanced with respect to the background instruments independent of acoustic sound scenarios and visual information. In contrast, NH listeners did not prefer a VIR different than zero dB. More details can be found in Tahmasebi et al. 2020.
Conclusion: The results confirm that CI users find music more enjoyable when the vocals are enhanced with respect to the instruments. Moreover, we show that source separation of vocals and background instruments can be achieved in real-time using relatively simple neural networks and that this technology is applicable for different acoustic scenarios and with the presence or the absence of visual information.
References:
- Buyens, W., van Dijk, V., Moonen, M. and Wouters, J. (2014). Music mixing preferences of CI recipients: A pilot study, Int. J. Audiol., vol. 53, no. 5, pp. 294–301.
- Gajęcki, T., & Nogueira, W. (2018). Deep learning models to remix music for cochlear implant users. The Journal of the Acoustical Society of America, 143(6), 3602-3615.
- Nagathil, A., Weihs, C., Neumann, K., Martin, R. (2017). “Spectral Complexity Reduction of Music Signals Based on Frequency-domain Reduced-rank Approximations: An Evaluation with Cochlear Implant Listeners,” J. Acous. Soc. Am. (JASA), 142(3), pp. 1219-1228, September 2017.
- Nogueira, W., Nagathi, A. and Martin, R. “Making Music More Accessible for Cochlear Implant Listeners: Recent Developments,” in IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 115-127, Jan. 2019, doi: 10.1109/MSP.2018.2874059.
- Pons, J., Janer, J., Rode, T., & Nogueira, W. (2016). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. The Journal of the Acoustical Society of America, 140(6), 4338-4349.
- Tahmasebi, S., Gajȩcki T.,& Nogueira W. (2020). Design and Evaluation of a Real-Time Audio Source Separation Algorithm to Remix Music for Cochlear Implant Users,Frontiers in Neuroscience, 1, 434, 2020.