Automated speech audiometry for children using Kaldi-NL automatic speech recognition

Authors: Gloria A Araiza Illan1*; Luke Meyer1; Khiet P. Truong2; Deniz Başkent1

1University Medical Center Groningen

2University of Twente

Background The digits-in-noise (DIN) test is widely used for speech audiometry in the Netherlands with two setups: 1) a clinician running the test and scoring participant’s spoken responses, and 2) the participant running the test online and manually entering the heard digits themselves. We have previously introduced an alternative setup with a laptop presenting the digits and the open-source automatic speech recognition (ASR) toolkit Kaldi-NL automatically scoring of participant’s responses, and evaluated its performance with normal-hearing native Dutch speaking adults. The system showed high accuracy and low word error rate (WER) in the recognised recorded spoken responses, with errors from Kaldi-NL having a low impact on the DIN test score. As a follow-up study, we explore the performance of the same setup with normal-hearing children.

Method 23 normal-hearing Dutch speaking children (5-17 years old, M = 11.6, SD = 2.8) completed the DIN test with the proposed setup (laptop and ASR). Their spoken responses were recorded and further used to manually compare them to the Kaldi-NL recognised digits.

Results Across participants, our preliminary results showed the percentage accuracy ranging from 32.3% to 98.5% (M = 83.2%, SD = 19.1%), and the WER from 2.8% to 69.2% (M = 18%, SD = 18.5%). An analysis of these scores as a function of age indicated that the system’s accuracy increased and its WER decreased as children’s age increased.

Conclusion The accuracy and WER of Kaldi-NL changed as a function of children’s age, for the oldest participants approximating to our previously obtained performance of normal-hearing adults. The next stage is to conduct the test with a more representative child population sample to characterise the influence of the Kaldi-NL errors on the final DIN test result, exploring the feasibility for clinical applications.

We thank Cas Smits for sharing the original DIN stimuli with us.