A classification approach to listening effort: combining features from the pupil and cardiovascular system

Bethany Plain1,2, Hidde Pielage1,2, Michael Richter3, Tanveer Bhuiyan4, Thomas Lunner2, Sophia E. Kramer1, Adriana A. Zekveld1

1 Amsterdam UMC, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, the Netherlands; 2 Eriksholm Research Centre, Snekkersten, Denmark; 3 School of Psychology, Liverpool John Moores University, Liverpool, United Kingdom; 4 Demant A/S, Kongebakken, Smørum, Denmark

Background: Physiological markers of autonomic nervous system activity are evaluated to reflect listening effort (LE). Different physiological measures are often uncorrelated with one another and with participants’ subjective ratings. This mismatch may arise because the measures reflect different aspects of LE. Here we trained a classifier using pupil and cardiovascular features to predict the experimental condition. Another classifier was trained with the same features, to predict the participant’s subjective perception during the experiment.

Methods: 29 hearing-impaired listeners undertook a speech-in-noise task (Danish HINT) at two signal-to-noise ratios, individually adapted to 50 and 80% intelligibility levels. The task was performed alone and in the presence of two observers. Per sentence, seven features were extracted, four from the cardiovascular system (inter-beat interval, pulse transit time, blood volume pulse amplitude and pre-ejection period) and three from the pupil (peak pupil dilation, mean pupil dilation and baseline pupil size). After the experiment, participants gave a semi-structured interview describing the experience. The interview transcripts were reviewed to determine whether each participant was affected by the the observers’ presence (yes/no). The seven physiological features were fed into k-nearest neighbor classifiers with 50-fold cross validation, to predict 1) the social state and intelligibility and 2) if the participant was affected by the observers’ presence.

Results: The social state (alone/observed) was predicted with an accuracy of 77.5% and the intelligibility (50/80%) was predicted with an accuracy of 61.2%. Different features contributed to these models. The participants’ response to the observers was predicted with an accuracy of 92.9%.

Conclusion: A combination of features may be preferable to a single dependent variable during LE studies. Some features were better suited to predicting intelligibility and others to predicting observers’ presence.

Plain et al
Schematic showing experimental set-up including the placement of the participant, loudspeakers and two observers.