An Automated Digits-In-Noise Hearing Test Using Automatic Speech Recognition and Text-To-Speech: A Proof-of-Concept Study

Authors: Mohsen Fatehifar1, Kevin Munro1, Michael Stone1, David Wong2, Timothy Cootes1, Josef Schlittenlacher3

1University of Manchester
2University of Leeds
3University College London

Background: With advances in machine learning, hearing tests that are managed by computers have been created. The aim was to build and evaluate a tool for AI-powered speech-in-noise (SIN) hearing tests. It specifically considers a digits-in-noise (DIN) hearing test with Automatic Speech Recognition (ASR) and Text-To-Speech (TTS). Validity and reliability were compared against a benchmark test.

Methods: Three methods of DIN tests were compared: 1. Benchmark test using pre-recorded stimuli and a graphical user interface. 2. Keyboard-based test using pre-recorded stimuli but ran on the code written by the researchers (before adding AI). 3. AI-powered test using TTS to synthesise the stimuli and ASR to transcribe responses. Apart from vocalisation and response capturing, its underlying code was identical to the keyboard-based test. The participant task in each test was to listen to the three digits and repeat these in the presented order. A 2-up/1-down staircase was used to obtain the signal-to-noise ratio for criterion performance of 71% correct. Retest was done on both the benchmark and AI-powered tests, all completed in one session of 90 minutes. The results were compared using Bland-Altman analyses.

Results: For reliability, the benchmark test-retest showed a mean of -0.5 dB with a 95%-limit of agreement (LoA) of ±3.8 dB, while the AI-powered test-retest had a mean of -0.8 dB with a 95%-LoA of ±6.0 dB. For validity, the Bland-Altman plot of benchmark and keyboard-based test had a mean of -0.8 dB with a 95%-LoA of ±5.9 dB. AI-powered test with benchmark had a mean of +0.3 dB and a 95%-LoA of ±6.3 dB. AI-powered test with keyboard-based test yielded a mean of -1.1 dB with a 95%-LoA of ±6.0 dB.

Conclusion: The results show that the developed tool is valid and reliable: it adds little error compared to the test-retest reliability of the benchmark test. The results also support the idea of using AI and show that it may be possible to use ASR and TTS in a SIN test.