What can we learn about tinnitus from social media posts?

Manon Revel1, Vinaya Manchaiah2, Alain Londero3, Guillaume Palacios4, Aniruddha K. Deshpande5, Ryan Boyd6, Pierre Ratinaud7

1 Institute for Data, Systems and Society, Massachusetts Institute of Technology, Cambridge, USA; 2 Department of Speech and Hearing Sciences, Lamar University, Beaumont, USA; 3 Service d’Oto-Rhino-Laryngologie et Chirurgie Cervico-Faciale, Paris, FR; 4 PainkillAR, TELECOM ParisTech, Paris, FR; 5 Department of Speech-Language-Hearing Sciences, Hofstra University and Long Island AuD Consortium, Hempstead, USA; 6 Department of Psychology, the Data Science Institute, and Security Lancaster, Lancaster University, Lancaster , UK; 7 Laboratory of Applied Studies and Research in Social Sciences, University of Toulouse, Toulouse, FR

Background: Individuals with tinnitus are highly heterogeneous in terms of etiology, manifestation of symptoms, and coping’s mechanisms. Most of these patients are likely to seek hearing health information and social support online via various websites or social media platforms. Indeed, information is easily accessible online. In the absence of evidence-based care, patients with similar symptoms regroup, share experiences, and exchange tips. The present study examines such discussions around tinnitus in Reddit free-texts posts.

Methods: The study uses a cross-sectional design. 130,000 posts were extracted from Reddit’s application programming interface. The 101,000 unique posts were analyzed using automated NLP techniques: hierarchical cluster analysis; unsupervised Machine Learning –Latent Dirichlet Allocation (LDA) — algorithm; and supervised sentiment analysis. After identifying the main topics in the corpus and assessing their sentiment scores, sub-themes are probed in selected topics with LDAs. Reddit users’ interactions are used to measure topics’ cooccurrence in messages and threads.

Results: The cluster and LDA analyses both result in a 16-topic solution with comparable clusters. The sentiment analysis shows heterogeneity among different factors. LDAs ran, e.g., on the Temporo-Mandibular Joint (TMJ) topic, highlights three distinctive sub-themes: teeth and jars; neck and back stress; and muscular and neuralgic pains. Finally, themes overlap emerges from their cooccurrence as suggested by the users’ activity. For instance, the TMJ topic appeared to be often discussed along with the Music Volume topic.

Conclusion: The study maps topics discussed on social media, some not explored in the literature (e.g., supplements, personal timeline). It also investigates topics interconnection in spontaneous discussions. The findings enrich the understanding of patients’ interplay with their conditions and inform the development of appropriate patient-centered strategies to support individuals with tinnitus.

Revel et al
Topics found with Latent Dirichlet Allocation on 100k Reddit posts. The top rectangle of each block is the topic’s name assessed through the algorithm’s output: the lemmatised words shown in the middle rectangle. The proportion of messages that mention the topic is displayed in the lower rectangle.