Accuracy and repeatability of Chatbot Responses in Audiology

Authors: Wiktor Jedrzejczak1, Krzysztof Kochanek1

1Institute of Physiology and Pathology of Hearing

Background: The development of AI-driven conversational tools, particularly chatbots, has sparked discussions regarding their utility and ethical considerations. This study aims to assess three prominent systems – OpenAI ChatGPT, Microsoft Copilot, and Google Gemini – in their ability to respond to audiological questions.

Methods: Chatbots were tested on both open-ended and multiple-choice, single-best-answer questions. Responses to open-ended questions were evaluated using a Likert scale ranging from 1 to 5. The assessment also examined inaccuracies or errors in the answers, and various aspects of the responses, such as word count, inclusion of references, and suggestions for seeking specialist assistance.

Results: Most responses from each chatbot received a satisfactory rating or higher, though all chatbots exhibited some errors or inaccuracies. ChatGPT garnered the highest overall score but did not provide information about its sources. The accuracy of responses to multiple-choice questions was lower than that of open-ended questions, and response variability was significant, reaching 20%.

Conclusions: Chatbots offer an intriguing means of accessing basic information in specialized fields like audiology. However, caution is necessary, as accurate information may be intermixed with subtle errors that can go unnoticed without a solid understanding of the subject matter. This is particularly concerning for ChatGPT, which lacks a routine provision of sources, making it challenging to assess the reliability of the information presented.