Tilak Ratnanather1, Lydia C. Wang1, Seung-Ho Bae1, Erin R. O’Neill2, Elad Sagi3, Daniel J. Tward1,4
1 Center for Imaging Science and Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA; 2 Department of Psychology, University of Minnesota, Minneapolis, USA; 3 Department of Otolaryngology, New York University School of Medicine, New York, USA; 4 Departments of Computational Medicine and Neurology, University of California Los Angeles, Los Angeles, USA
Background: While clinical speech tests assess the ability of people with hearing loss to hear with a hearing aid or cochlear implant, these tests usually examine speech perception at the word or sentence level. Because few tests analyze perceptual errors at the phoneme level, there is a need for an automated program to compute and visualize the accuracy of phonemes in response to speech tests.
Method: Stimulus-response pairs are read in by the program to obtain phonetic representations from a digital pronouncing dictionary. Global alignment between response and stimulus phonemes is achieved through the use of a Levenshtein Minimum Edit Distance algorithm with costs for insertion, omission and substitutions. Accuracies for each phoneme are based on a modified F-score, which are then averaged and visualized with respect to place and manner (consonants) or height (vowels). Confusion matrices of stimulus-response phoneme pairs undergo information transfer analysis based on ten prescribed phonological features, from which a histogram of the relative information transfer for the features is shown as a phonemegram.
Results: The program was applied to one dataset, where stimulus-response sentence pairs from 6 volunteers (each with varying degrees of hearing loss) were analyzed. 4 volunteers listened to sentences from a mobile auditory training app while 2 listened to sentences from a clinical speech test. Stimulus-response word pairs from 3 word lists were also analyzed. In all cases, visualization of phoneme accuracy was obtained in real-time.
Conclusion: It is possible to automate the alignment of phonemes from stimulus-response pairs from speech tests in real-time, which then makes it easier to visualize the accuracy of responses via phonetic features. Such visualization of phoneme alignment and accuracy could aid speech language pathologists and audiologists either in person or virtually.
