Authors: Sigrid Polspoel1, Sophia E. Kramer1, Bas Van Dijk2, Cas Smits1
1Amsterdam UMC, Vrije Universiteit Amsterdam, Otolaryngology – Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health research institute, De Boelelaan 1117, Amsterdam, Netherlands
2Cochlear, Advanced Innovation – Algorithms and Application – Cochlear Technology Centre Schaliënhoevedreef 20i, 2800 Mechelen, Belgium
Background The digit-in-noise (DIN) test is a successful hearing test that is used as a screening instrument, a diagnostic tool in clinics, as well as a self-administered home test for CI users. The current limitation of the test is that, since the speech stimuli are language specific, it needs to be developed separately for each language. This makes the development time consuming, expensive and subject to improvement. Another limitation is that the DIN test is not customized for CI users, yielding less accurate test results in this group. These issues will be tackled in this project by applying artificial intelligence techniques to automate the entire development procedure.
Goal The aim of the Automatic LAnguage-independent Development of the Digits-In-Noise test (Aladdin)- project is to create a test development procedure for the automatic generation of digits-in-noise tests. This procedure will employ text-to-speech (TTS) and automatic speech recognition (ASR) systems to design DIN tests in various languages and for different target populations such as CI users. As all new DIN tests will have the same development procedure, the test results will become more comparable across languages than what is currently the case. Moreover, this project has the potential to make the DIN affordable for low and middle income countries by drastically reducing development costs.
Method Multiple studies will be conducted to assess whether the current development procedure (Smits at al.) can be replaced by an automatic one. First, we will evaluate if the speech produced by a TTS system can replace a human voice in the context of hearing tests. Next, speech recognition functions of the speech items are obtained to have a future reference for the ASR system for three target groups: normal hearing listeners, listeners with hearing loss and CI users. Finally, ASR systems are trained to construct speech recognition functions of synthesized speech material, including stimuli that have been processed by a CI processor. The speech recognition functions of the ASR systems are compared to the ones obtained in the study with human listeners. The ultimate result is a system where the TTS system creates the spoken digits and the ASR system equalizes recognition of the individual digits resulting in accurate DIN tests in any language (Figure 1). We aim to have the Aladdin project accomplished by the end of 2023.
Smits C., Theo Goverts S., Festen J. M., The digits-in-noise test: Assessing auditory speech recognition abilities in noise. J. Acoust. Soc. Am. 133, 1693–1706 (2013).