Do you know a useful tool? Please send your suggestion to firstname.lastname@example.org
A central hub to share resources related to computational audiology
Here we provide options to share resources that are published on general platforms including OSF, Zenodo, Hugging Face, or GitHub among others. Our suggestion to further develop this is by embracing the concept of the distributed data mesh. Also, we collected dedicated auditory toolboxes such as the Auditory Modeling Toolbox (AMT), and the Psychoacoustic Software Package. Remote testing examples include the Remote Testing Wiki, the Portable Automated Rapid Testing (PART), Ecological Momentary Assessment (EMA) and the NIOSH soundlevel meter. The basic idea is to use computationalaudiology.com as a central hub to share resources that are useful for researchers and clinicians.
- sharing of research software, tools, and models
- sharing best practices (data policies, software licensing), inspiring peers, and increasing transparency
- facilitating cooperation across centers increase sample sizes and strengthen the robustness of experimental evaluations
- building a community that fosters effective collaboration and uses similar tools and data sharing pipelines
Examples of domain-specific tools and models:
- Clinical audiology tools, (e.g. interfaces for patients to log their own data)
- Tools for research scientists who are coding up experiments (e.g. Psychophysics toolbox)
- Computational models of perception/nervous system (e.g. Auditory Modeling Toolbox)
- Datasets for research (e.g. Zenodo)
The way to go forward might be the data mesh approach explained below.
A burgeoning concept aimed at improving the way data can be leveraged beyond its original narrowly scoped intended purpose is the distributed data mesh.
The principal novelty of such an approach is this:
Treat the data you produce like it’s a flagship product.
When the implications of this idea on day-to-day work in research are thoroughly considered, we can see how this drastically differs from the normal process of scientific publication. Consider what would be expected if data acquisition or processing were outsourced to a company. How should we expect them to serve the results back to us? What characteristics should it have? We should certainly be able to trust it. To fulfill this, we should know exactly how it was made, what it is, and how to use it. Accordingly we would expect metadata including exhaustive provenance details and annotations such that the data would be self-evident, i.e. we don’t need help understanding what we’re looking at. It should also be resilient and easy to access (not easy to lose in an accidentally deleted email attachment with no backups). What else would we expect?
The idea of a data mesh was proposed to counter the urge to consolidate data and resources into centrally managed data warehouses or “data lakes.” This was due to the realization that a centralized team could not possible know your data better than you but to be beneficial, the new approach will require the careful implementation of federation. That is, a balance needs to be maintained between the intellectual/operational autonomy of those producing the data products and community-level consensus on global standards. People and computers need to be able to easily find, access, and understand what’s out there without things becoming too brittle when new ideas come up.
Only recently have journals begun to require data and code as supplements to publication. This is not universal, and even when in place, compliance and enforcement is variable. As encouraging as the progress is when raw datasets and implementations of computational methods are shared with publications, there is vast potential for improvement. For now, it might be enough to begin asking questions like,
- “If I had access to the data these researchers used in their publication, what could I do with it that they hadn’t thought of?”
- “How could they have prepared and served up that data to be easier for me to get my hands on?”
- “Where are the loose ends in the community that standards might help to fix that would help me integrate that data with what I already have?”
Then we can use answers to these to consider and talk about how we could change our own ways of doing things. Thoughtfully identifying and outlining the technical, managerial, institutional and other problems preventing the implementation of this approach is an important prerequisite to finding solutions.
Here is the definition of open science found on Wikipedia: “the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of an inquiring society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open-notebook science, and generally making it easier to publish and communicate scientific knowledge“. The are multiple large initiatives to stimulate open science. For instance, the non for profit Open Science Foundation (OSF) is creating all kinds of tools to disseminate knowledge. The European project, Zenodo is a large open repository branched from CERN.
Open Science Foundation (OSF)
[add here API to sync collections]
Below a short clip that highlights some of the tools created by OSF
We created a Zenodo community for computational audiology. The aim of this community is to share data, code and tools useful for computational audiology and related fields such as digital hearing health care and AI in health care. Resources will be integrated on the computationalaudiology.com forum in order to make the data, code and tools easier to find for researchers and clinicians. Researchers that have shared a repository or project on Zenodo can edit their repository to request to add their work to the community. To add an existing repository you need to edit the existing entry, and then you see the screen below (example).
Look for the community field and search the community you want to add, here “computational audiology. When found, press enter to add the community. Then press save at the top of the form. After pressing “save” you have to press “publish” for the saved changes to take effect. The curator will receive a message regarding the inclusion of the entry in the community. For a new entry the form looks the same, including the community section. A new repository can be added via this link.
Zenodo Repositories from the Computational Audiology community
- Alejandro Osses, Léo Varnet. (May, 2023). fastACI toolbox: the MATLAB toolbox for investigating auditory perception using reverse correlation (Version v1.3). Zenodo. https://doi.org/10.5281/zenodo.7888588
- Hendrikse, Maartje M. E., Dingemanse, Gertjan, Goedegebure, André. (April, 2023). Virtual audiovisual scenes for hearing device fine-tuning. Zenodo. https://doi.org/10.5281/zenodo.7794257
- Andrea Gulli, Federico Fontana, Michele Geronazzo. (April, 2023). Interaural Time Difference discrimination threshold determined through three alternative 2I-2AFC procedures. Zenodo. https://doi.org/10.5281/zenodo.7808559
- Daniel Kipping. (November, 2022). APGDHZ/Single-fiber-EAS-model: v1.0.2 (Version v1.0.2). Zenodo. https://doi.org/10.5281/zenodo.7331364
- Ibelings, Saskia, Brand, Thomas, Holube, Inga. (May, 2022). Synthetic Göttingen Sentence Test material created with a text-to-speech system (Version 1). Zenodo. https://doi.org/10.5281/zenodo.6513570
- tobiasherzke, Paul Maanen, frasherloshaj, hendrikkayser, Marc Joliet, Giso Grimm, steffendasenbrock, Zain Sohail. (February, 2022). HoerTech-gGmbH/openMHA: Release 4.17.0 (Version v4.17.0). Zenodo. https://doi.org/10.5281/zenodo.6281149
- Kayser, Hendrik, Herzke, Tobias, Maanen, Paul, Zimmermann, Max, Grimm, Giso, Hohmann, Volker. (December, 2021). Open community platform for hearing aid algorithm research: open Master Hearing Aid (openMHA). Zenodo. https://doi.org/10.1016/j.softx.2021.100953
- Thalmeier, Dominik, Miller, Gregor, Schneltzer, Elida, Hurt, Anja, Hrabe de Angelis, Martin, Becker, Lore, Müller, Christian L., Maier, Holger. (December, 2021). ABR raw data and results from automated hearing threshold detection. Zenodo. https://doi.org/10.5281/zenodo.5779876
- Hendrik Kayser, Tobias Herzke, Paul Maanen, Max Zimmermann, Giso Grimm, Volker Hohmann. (December, 2021). Open community platform for hearing aid algorithm research: open Master Hearing Aid (openMHA). Zenodo. https://doi.org/10.5281/zenodo.5770153
- Volker Hohmann. (November, 2021). The Period-Modulated Harmonic Locked Loop (PM-HLL): A low-effort algorithm for rapid time-domain multi-periodicity estimation. Zenodo. https://doi.org/10.5281/zenodo.5727778
Hugging Face is an AI research organization and platform, primarily known for its contributions to the field of natural language processing (NLP). The platform specializes in developing open-source tools, libraries, and pre-trained models for a variety of NLP tasks, such as machine translation, text summarization, and question-answering, among others.
One of their flagship products is the Transformers library, which provides an extensive collection of pre-trained models and state-of-the-art architectures like BERT, GPT, and RoBERTa. The library is designed to be user-friendly, making it easy for developers and researchers to build, train, and deploy cutting-edge NLP models.
Developers can use Hugging Face to share models by creating an account on their Model Hub. The Model Hub is a collaborative platform that allows developers to upload, fine-tune, and share pre-trained models with the community. By sharing models, developers can contribute to the ongoing growth and diversification of NLP applications, while also benefiting from the expertise of others in the field. Additionally, the Model Hub enables users to explore and directly integrate pre-trained models into their own projects, simplifying the process of model deployment and experimentation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR) technology translates spoken language into written text by converting sequences of audio input into textual output. This technology powers virtual assistants such as Siri and Alexa, providing daily assistance to users. For developers and users focusing on solutions for the hearing impaired, ASR models offer invaluable applications like real-time captioning and automated transcription during meetings, enabling improved accessibility and communication for those with hearing challenges.
Wav2Vec2-Base-960h is a powerful ASR model developed by Facebook, pre-trained and fine-tuned on 960 hours of Librispeech using 16kHz sampled speech audio. It is currently the most downloaded ASR model on Hugging Face. Wav2Vec2 demonstrates that learning robust representations from speech audio alone, followed by fine-tuning on transcribed speech, can outperform semi-supervised methods while maintaining simplicity.
This groundbreaking model has proven effective in scenarios with limited labeled data, making it a valuable tool for enhancing accessibility and communication for individuals with hearing challenges.
Whisper is a pre-trained ASR and speech translation model developed by OpenAI, boasting a strong generalization capacity across various datasets and domains without requiring fine-tuning. It is a Transformer-based encoder-decoder or sequence-to-sequence model, trained on 680k hours of labeled speech data using large-scale weak supervision.
The model is available in five configurations, with English-only and multilingual versions. English-only models focus on speech recognition, while multilingual models cater to both speech recognition and translation. The largest checkpoints are exclusively multilingual. All ten pre-trained checkpoints can be found on the Hugging Face Hub.
Whisper is particularly valuable for developers and users working on audio solutions for the hearing impaired, as it can significantly improve accessibility and communication through real-time captioning and automated transcription services.
Audio Spectrogram Transformer (AST)
The Audio Spectrogram Transformer (AST) is an innovative model introduced in the paper “AST: Audio Spectrogram Transformer” by Yuan Gong, Yu-An Chung, and James Glass. Instead of using convolutional neural networks (CNNs), AST leverages a Vision Transformer approach by converting audio into spectrograms (images) for audio classification tasks. The model achieves state-of-the-art results in various audio classification benchmarks.
For developers and users working on audio solutions for the hearing impaired, the AST model can be fine-tuned on custom datasets for enhanced performance in applications like real-time captioning and automated transcription. When fine-tuning, it’s crucial to normalize input data (mean of 0 and std of 0.5) using ASTFeatureExtractor, which defaults to AudioSet mean and std. Additionally, it’s important to select a suitable learning rate and learning rate scheduler, as AST requires a low learning rate and converges quickly.
GitHub is the largest platform for code hosting that enables version control and collaboration. It lets you and others work together on projects from anywhere. Here we will collect useful code and repositories for auditory experiments, modeling, data processing, and analyses. You can make your existing repository better findable by adding a topic to your repositories which acts as a ‘tag/label’. We recommend adding the topic ‘Computational Audiology’, and maybe additional label including ‘Cochlear Model’ / [specific topic] / etc to GitHub repositories you wish to share with the computational audiology community.
AIDA is an Active Inference-based Design Agent that aims at real-time situated client-driven design of audio processing algorithms. Here is a link to the accompanying paper (Podusenko et al., 2022).
At the McDermott lab a headphone screening task was developed to facilitate web-based experiments employing auditory stimuli. The efficacy of this screening task has been demonstrated in . The headphone check is intended to precede the main task(s), and should be placed at or near the beginning of an online experiment. Participants who pass are allowed through to the remainder of the experiment, but those who do not pass should instead be routed to an ending page and must leave the experiment after screening.
Computing principles for scientific researchers
naplib-python, developed by Gavin Mischler, Vinay Raghavan, Menoua Keshishian, and Nima Mesgarani, is a Python module aimed at promoting transparency and reproducibility in auditory neuroscience (Mischler et al., 2023). It offers a general data structure for handling neural recordings and stimuli, along with comprehensive preprocessing, feature extraction, and analysis tools. The package simplifies complexities associated with the field, such as varying trial durations and multi-modal stimuli, and provides a versatile analysis framework compatible with existing toolboxes. Developed under the MIT License, naplib-python is accessible through its GitHub repository and supports various operating environments, including Linux, macOS, and Windows. naplib-python’s comprehensive manual can be found at https://naplib-python.readthedocs.io, providing detailed guidance on utilizing the package. For any questions or support, you can reach out to Nima Mesgarani at email@example.com.
Spectral and temporal modulation detection
Adam Bosen implemented 3 alternative forced choice adaptive modulation detection tasks to estimate detection thresholds in jsPsych, which allows these tasks to be conducted through a web browser.
Tilak J. Ratnanather, Lydia C. Wang, Seung-Ho Bae, Erin R. O’Neill, Elad Sagi and Daniel J. Tward created scripts for analyzing phoneme errors from speech perception tests. To appear in Frontiers in Neurology: Digital Hearing Healthcare, ‘Visualization of Speech Perception Analysis via Phoneme Alignment: a pilot study.’
Here is the Python code from the WHISPER (Widespread Hearing Impairment Screening and PrEvention of Risk) project. You can use it for training and evaluation of machine learning models for hearing loss detection through speech-in-noise testing and post-hoc explainability analysis (SHapley Additive exPlanations-SHAP, Partial Dependence Plots-PDPs, and Feature Permutation Importance) applied to non-natively explainable models (e.g., Random Forests). The code is made available by Alessia Paglialonga and Marta Lenatti.
3D Audio Spatialiser and Hearing Aid and Hearing Loss Simulation
The 3DTI Toolkit is a standard C++ library for audio spatialisation and simulation using headphones developed within the 3D Tune-In (3DTI) project (http://www.3d-tune-in.eu).The Toolkit allows the design and rendering of highly realistic and immersive 3D audio, and the simulation of virtual hearing aid devices and of different typologies of hearing loss.
Technical details about the 3D Tune-In Toolkit spatialiser are described in:
Cuevas-Rodríguez M, Picinali L, González-Toledo D, Garre C, de la Rubia-Cuestas E, Molina-Tanco L and Reyes-Lecuona A. (2019) 3D Tune-In Toolkit: An open-source library for real-time binaural spatialisation. PLOS ONE 14(3): e0211899. https://doi.org/10.1371/journal.pone.0211899
Music Modeling and Music Generation with Deep Learning
Music Modeling and Music Generation with Deep Learning have made significant advancements, enabling the creation of intricate and captivating compositions. Tristan Behrens has collected various models, datasets, and valuable resources that contribute to this rapidly evolving field. The latest additions to his Github repository include research papers such as “AudioLM: a Language Modeling Approach to Audio Generation,” “MusicLM: Generating Music From Text,” and “ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models.” To explore these cutting-edge resources and stay informed on the latest developments, visit Tristan Behrens’ Github repository or connect with him on LinkedIn to follow his work and insights on deep learning in music generation and modeling.
Converting mono recordings to stereo
monotoSTEREO.info, curated by Christopher Kissel, offers a compilation of resources for upmixing mono music recordings to stereo. The site focuses on techniques such as audio spectral editing and sound source separation to create stereo mixes that closely mimic those made from multitrack session tapes.
human cochlear filter models
There have been developed several advanced (non-linear) models to simulate characteristics of the human auditory system. Saremi et al. (2016) compared 7 contemporary models, see below figure. By sharing the computer code we hope to improve the understanding of the many intricacies of the models and to facilitate direct comparisons or to help you to select the appropriate model for your aim. A number of these models are implemented in the Auditory Modeling Toolbox.
Software for “Frequency analysis and synthesis using a Gammatone filterbank”
In the Zenodo community, you can find the Matlab implementation of the gammatone filterbank described in Volker Hohmann’s Paper `Frequency analysis and synthesis using a Gammatone filterbank’ (Hohmann, 2002). It uses numerical methods described in `Improved numerical methods for gammatone filterbank analysis and synthesis’ by T. Herzke and V. Hohmann (Herzke & Hohmann, 2007)
The CAR-FAC (cascade of asymmetric resonators with fast-acting compression) is a cochlear model implemented as an efficient sound processor, for mono, stereo, or multi-channel sound inputs. It has been created by Richard F. Lyon. The model is introduced in Richard’s book: ‘Human and Machine Hearing: Extracting Meaning from Sound‘ and in Lyon, 2011. This package (upper GitHub repository) includes Matlab and C++ implementations of the CARFAC model as well as code for computing Stabilized Auditory Images (SAIs). A jupiter Notebook version was written by Andre van Schaik. See the design doc for a more detailed discussion of the software design.
A CARFAC object knows how to design its details from a modest set of parameters, and knows how to process sound signals to produce “neural activity patterns” (NAPs). It includes various sub-objects representing the parameters, designs, and states of different parts of the CAR-FAC model. The CARFAC class includes a vector of “ear” objects — one for mono, two for stereo, or more. The three main subsystems of the EAR are the cascade of asymmetric resonators (CAR), the inner hair cell (IHC), and the automatic gain control (AGC). These are not intended to work independently, but each part has three groups of data associated with it. A recent description of CARFAC and how to address quadratic distortions is provided in Saremi & Lyon (2018).
Further reading: Lyon, R. F. (2017). Human and Machine Hearing. Cambridge University Press.
A Jupyter Notebook version (GitHub Repository below) was written by Andre van Schaik based on Dick Lyons’ book Human and Machine Hearing (this links to his blog).
Physiological models for auditory-nerve and midbrain neurons
The University of Rochester (UR) developed a GUI interface to provide visualizations of population responses of auditory-nerve (AN) and inferior colliculus (IC) model neurons. The underlying models have been included in the AMT (see below) and modelDB (see below). A cloud-based version of the UR_EAR GUI is now available as a web app (https://urhear.urmc.rochester.edu) Open-source MATLAB code is available on the Open Science Framework (OSF) site: https://osf.io/6bsnt/.
ModelDB provides a well-curated accessible and searchable database of computational models of neurons including files for running the models. You can browse or search through over 1740 models. A ModelDB entry contains a model’s source code, description, metadata, and a citation of the accompanying peer-reviewed publication. The model’s source code can be in any programming language for any environment. The database was created and is maintained by the SenseLab. For further information, see model sharing in general and ModelDB in particular.
The Auditory Modeling Toolbox (AMT)
In December 2021 the Auditory Modeling Toolbox was updated including new models (mostly dealing with binaural speech intelligibility), updates of other models, as well as many bug fixes and improvements. The detailed changes are enlisted here. The AMT 1.1 is a community-driven project and would not exist without our help! If you find a bug or wish an improvement, please create a ticket in the “Bug and feature request” section of the project.
Currently, the AMT core team consists of: Piotr Majdak, Clara Hollomey, Robert Baumgartner, and Michael Mihocic. Overall there is a very large group of contributors enlisted on the AMT website.
Psychophysics Toolbox (PTB-3)
>> help PsychDemosin Matlab after installation). Matlab’s help feature is one of Matlab’s better features and is helpful to learn by trial and error.
The Psychtoolbox core development team consists of David Brainard, Mario Kleiner, Denis Pelli and Tobias Wolf. Follow this link to get in touch.
Below you find the main GitHub repository for development of Psychtoolbox-3. It is meant for developers or alpha-testers only, not for regular users! Regular users please download the toolbox here.
Psychoacoustic Software Package (by Moore & Sek)
Professor Aleksander Sęk and Professor Brian Moore developed a software package that allows a wide variety of experiments in psychoacoustics without the need for time-consuming programming or technical expertise. The only requirements are a personal computer (PC not Apple) with a good-quality sound card (preferably an external sound card) and a set of headphones. The software is intended for students of psychoacoustics and related disciplines, such as audiology and audio engineering, who want to try psychoacoustic experiments for themselves. The software should also be useful for researchers who want to run experiments without the need to spend time writing computer software to generate the stimuli, run the experiment, and gather the data. The software can be downloaded here: with English / Polish instructions and we recommend buying the accompanying book “Guide to PSYCHOACOUSTICS” by Sęk, A. P., and Moore, B. C. J. (2021). (Adam Mickiewicz University Press, Poznan, Poland), pp. 348. DOI: 10.14746/amup.9788323239321. You can buy the book via the publisher’s website.
Remote testing wiki
The Task Force on Remote Testing, an initiative of the Technical Committee on Psychological and Physiological Acoustics (PP) of the Acoustical Society of America (ASA), created a wiki with information and examples for remote testing of information about approaches to data collection outside the lab, for example in participants’ own homes. They also provide advantages of remote testing, such as large-N studies and access to special populations. See https://www.spatialhearing.org/remotetesting/Main/HomePage
Last year Chris Stecker, chair of the ASA P&P Task Force on Remote Testing, performed a survey to collect experiences and resources for remote testing. A wiki-based webpage is created that contains discussions, best practices, and links to other resources related to remote testing.
Other web-based methods
Adam Bosen developed web-based methods for testing speech recognition, psychophysics, and working memory. See his personal website and some Github repositories above.
This is an example of a Zotpress in-text citation . Place a bibliography shortcode somewhere below the citations. This will generate the in-text citations and a bibliography.
olMEGA – an open-source toolkit for Ecological Momentary Assessment
Ecological Momentary Assessment (EMA) is a method for collecting data momentarily and repeatedly in natural environments with electronic devices (Holube et al., 2020). olMEGA is a toolkit for EMA developed at the Institute of Hearing Technology and Audiology of Jade University of Applied Sciences, Oldenburg, Germany. The olMEGA system is described in Kowalk et al. (2020) and was used to evaluate hearing aid benefit in real life (von Gablenz et al., 2021). The open-source toolkit offers software for the creation of questionnaires running on Android smartphones as well as software for analysis and data-handling. All tools are available at https://github.com/ol-MEGA. The latest builds of some olMEGA-Tools can be found here: https://tgm.jade-hs.de/media/olMEGA/.
The following list describes the main tools
|olMEGA_MobileSoftware_V2||The source code for the Android APK file (Java)|
|olMEGA_DataService_Server||SQL Databank server software (Django based, Python)|
|olMEGA_DataService_Client||The client for the SQL server (Python)|
|olMEGA_DataExtraction||Tool to import the data from mobile devices to the computer (Matlab, adb)|
Tools for Own Voice Detection (Matlab)
Tools for data analysis (Matlab)
Cognition and Natural Sensory Processing Workshop (CNSP) resources
For researchers interested in studying natural speech or music perception with EEG/MEG/ECoG, the organizers of the CNSP collected a number of resources including tutorials and lectures. For data sharing and standardization purposes, the group adopted a Continuous-event Neural Data data format. Please check out their CND data format (Continuous-event Neural Data) here. Detailed insights on the format can be found in the data-preparation guidelines.
Deep learning and Music
Jukebox from OpenAI is a neural net that generates music, including rudimentary singing, in different genres and artist styles as raw audio. OpenAI intends to make the model weights, code, and a sample exploration tool available to the public.
e-Audiology tools for clinicians
Below you find several online tools that could be used in clinics to administer hearing test, screen for hearing loss or other to determine safe listening levels.
For the 13th ARO symposium in February 2021, we created a demonstration page to administer online a Digits-In-Noise Test Using Antiphasic Stimuli. The researchers found that the antiphasic digit presentation improved the sensitivity of the DIN test to detect sensorineural hearing loss (De Sousa et al, 2020).In addition, the test can distinguish conductive hearing loss from sensorineural hearing loss, while keeping test duration to a minimum by testing binaurally.
Here is a short 2-minute video from HearX that shared the online DIN-test.
The Hörtech expert center developed also a web-based implementation of the DIN-test.
The Basic Auditory Skills Evaluation (BASE) battery comprises 17 brief online tests. It was designed by Shafiro et al. to provide a comprehensive assessment of patient performance at-home or in-clinic. Please have a look at the presentation at the Internet&Audiology 2021 conference for further background.
Portable Automated Rapid Testing (PART) includes a wide variety of auditory tasks, all of which have been shown to have utility in assessing auditory function in the laboratory. The set of tasks were chosen by the PART development team, which is led by Frederick J. Gallun at the Oregon Health & Science University, David Eddins at the University of South Florida, and Aaron Seitz at the University of California Riverside (UCR). The program is a production of the UCR Brain Game Center, a research unit focused on brain fitness methods and applications.
NIOSH Sound Level Meter App
The National Institute for Occupational Safety and Health (NIOSH) has developed a Sound Level Meter (SLM) app that combines the best features of professional sound levels meters and noise dosimeters into a simple, easy-to-use package. The app was developed to help workers make informed decisions about their noise environment and promote better hearing health and prevention efforts. Download the app for iOS. More information from NIOSH.
Automated Speech Recognition apps
Several apps have been developed for people with hearing loss to transcribe speech to text. Here are links to download the apps on your smartphone or tablet: AVA (iOS, Android), Earfy (iOS, Android), Live Transcribe (Android), Speechy (iOS), and NALscribe (iOS). NALscribe was specifically developed for use in Audiology Centers. Have a look at the 2-minute video about the design and development of the app. Pragt et al. (2021) evaluated the audiological performance of 4 of the above apps. Loizidis et al. (2020) describe novel use cases for such apps, including communication with people wearing a facemask or through closed glass surfaces doors.
We would like to thank Giovanni Di Liberto, Laurel H. Carney, Inga Holube, Richard F. Lyon, Alessia Paglialonga, Marta Lenatti, Piotr Majdak, Clara Hollomey, Josh McDermott, Elle O’Brien, Adam Bosen, Tilak J. Ratnanather, Frederick J. Gallun, Valeriy Shafiro Brian C.J. Moore, Volker Hohmann, and Raul Sanchez-Lopez for making software and data freely available. We also acknowledge GPT4 for editing part of the text above.
Cuevas-Rodríguez M, Picinali L, González-Toledo D, Garre C, de la Rubia-Cuestas E, Molina-Tanco L and Reyes-Lecuona A. (2019) 3D Tune-In Toolkit: An open-source library for real-time binaural spatialisation. PLOS ONE 14(3): e0211899. https://doi.org/10.1371/journal.pone.0211899
Gong, Y., Chung, Y. A., & Glass, J. (2021). Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778.
Herzke, T., & Hohmann, V. (2007). Improved numerical methods for gammatone filterbank analysis and synthesis. Acta Acustica United with Acustica, 93(3), 498–500.
Hohmann, V. (2002). Frequency analysis and synthesis using a Gammatone filterbank. Acta Acustica United with Acustica, 88(3), 433–442.
Holube, I., von Gablenz, P., & Bitzer, J. (2020). Ecological Momentary Assessment in Hearing Research: Current State, Challenges, and Future Directions. Ear and Hearing, 41, 79S. https://doi.org/10.1097/AUD.0000000000000934
Kowalk, U., Franz, S., Groenewold, H., Holube, I., von Gablenz, P., & Bitzer, J. (2020). olMEGA: An open source android solution for ecological momentary assessment. GMS Zeitschrift Für Audiologie – Audiological Acoustics, 2, Doc08. https://doi.org/10.3205/zaud000012
Lyon, R. F. (2011). Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function. The Journal of the Acoustical Society of America, 130(6), 3893–3904.
Lyon, R. F. (2017). Human and Machine Hearing. Cambridge University Press.
Majdak, P., Hollomey, C., & Baumgartner, R. (2021). AMT 1.0: The toolbox for reproducible research in auditory modeling. Submitted to Acta Acustica.
Mischler, G., Raghavan, V., Keshishian, M., & Mesgarani, N. (2023). naplib-python: Neural Acoustic Data Processing and Analysis Tools in Python. arXiv preprint arXiv:2304.01799.
Podusenko, A., van Erp, B., Koudahl, M., & de Vries, B. (2022). AIDA: An Active Inference-based Design Agent for Audio Processing Algorithms. Frontiers in Signal Processing, 2, 842477. https://doi.org/10.3389/frsip.2022.842477
Ratnanather, J., Wang, L., Bae, S.-H., O’Neill, E., Sagi, E., & Tward, D. (n.d.). Visualization of Speech Perception Analysis via Phoneme Alignment: A Pilot Study. Front. Neurol., 12:724800. https://doi.org/10.3389/fneur.2021.724800
Saremi, A., Beutelmann, R., Dietz, M., Ashida, G., Kretzberg, J., & Verhulst, S. (2016). A comparative study of seven human cochlear filter models. The Journal of the Acoustical Society of America, 140(3), 1618–1634.
Saremi, A., & Lyon, R. F. (2018). Quadratic distortion in a nonlinear cascade model of the human cochlea. The Journal of the Acoustical Society of America, 143(5), EL418–EL424.
De Sousa, K. C., Swanepoel, D. W., Moore, D. R., Myburgh, H. C., & Smits, C. (2020). Improving Sensitivity of the Digits-In-Noise Test Using Antiphasic Stimuli. Ear and Hearing, 41(2), 442–450. https://doi.org/10.1097/AUD.000000000000077
von Gablenz, P., Kowalk, U., Bitzer, J., Meis, M., & Holube, I. (2021). Individual Hearing Aid Benefit in Real Life Evaluated Using Ecological Momentary Assessment. Trends in Hearing, 25, 2331216521990288. https://doi.org/10.1177/2331216521990288
January 6, 2022 by Jan-Willem Wasmann. Updates: AMT 1.1, WHISPER, CARFAC, PTB-3.
January 14, 2022 by Jan-Willem Wasmann. Updates: Olmega
February 14, Whisper Github link fixed.
July 2, UR_EAR and CNSP resources added
July 4, 3D tune-in model and ModelDB added
January 30 JukeBox and Mono-stereo converter added
April 8, 2023 Hugging Face, ASR and AST added.