logo
ResearchBunny Logo
Natural language processing analysis of the theories of people with multiple sclerosis about causes of their disease

Medicine and Health

Natural language processing analysis of the theories of people with multiple sclerosis about causes of their disease

C. Haag, N. Steinemann, et al.

Discover how individuals with multiple sclerosis (MS) theorize about the causes of their condition, influencing their mental health and treatment adherence. Researchers from the University of Zurich examined theories held by 486 participants, revealing that mental distress, stress, and hereditary factors are frequently cited causes. This pivotal study emphasizes the importance of communication between healthcare professionals and individuals with MS.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates how persons with multiple sclerosis (MS) conceptualize the causes of their disease and how these beliefs relate to their wellbeing and health behaviors. Although genetic and environmental risk factors for MS—such as HLA-DRB1*15:01, Epstein-Barr virus (EBV) infection, smoking, low vitamin D levels, and low ultraviolet exposure—are well established, little is known about patients’ own theories. Understanding these beliefs matters because they may influence mental health, coping, treatment adherence, and the perceived legitimacy of evidence-based therapies, as framed by the Health Belief Model and the Cognitive Theory of Adaptation. Prior research has shown that illness perceptions and self-efficacy relate to wellbeing and coping in MS. Topic modeling of large-scale free-text offers a way to systematically explore such beliefs. This study, co-designed with persons with MS through the Swiss MS Registry (SMSR), aimed to identify and quantify the themes in patients’ etiological theories and to group them into higher-level categories, expecting mentions of both life-experience-based ideas and established scientific risk factors (genetics, EBV, smoking, vitamin D).
Literature Review
The paper situates the research within established knowledge on MS risk factors and theoretical frameworks of health behavior and adaptation. It reviews: (1) Genetic factors, especially familial aggregation and HLA-DRB1*15:01; (2) Environmental factors including smoking, vitamin D deficiency/low UV exposure, and EBV infection; (3) Health Belief Model, linking beliefs about disease and benefits/barriers to behaviors; and the Cognitive Theory of Adaptation, emphasizing meaning-making, perceived control, and self-esteem after illness. It also references prior applications of natural language processing and topic modeling in MS—such as large-scale analyses of life stories and COVID-19 lockdown experiences—and contrasts with manual qualitative analyses. Advances in transformer-based topic modeling (e.g., BERTopic) are noted as enabling higher-quality, scalable text exploration. Discrepancies between evidence and public beliefs (e.g., vaccinations) are highlighted as an ongoing issue in the literature and public health.
Methodology
Design and participants: A survey within the Swiss MS Registry (SMSR)—a longitudinal, participatory citizen-science registry—collected free-text responses on personal theories about MS etiology (2020–March 2023). Ethics approval was obtained (Canton of Zurich PB-2016-00894; BASEC-no: 2019-01027) and informed consent provided. Of ~2700 invitees, 603 responded; 486 contributed analyzable theories after excluding those with no theory or non-informative responses. Measures: Two open-ended questions elicited general assumptions (Q1) and specific risk factors/why (Q2). Given overlap, texts from both questions were combined per participant. Preprocessing: Responses across German (78.6%), French (17.7%), Italian (3.3%), and English (0.4%) underwent language-specific manual spell-checking. French/Italian texts were machine-translated to German using Hugging Face Helsinki-NLP models, then reviewed by native speakers. Lemmatization and part-of-speech tagging focused on nouns via spaCy de_dep_news_trf. Topic modeling: BERTopic was used with the paraphrase-multilingual-MiniLM-L12-v2 sentence-transformer to embed texts, UMAP for dimensionality reduction, and HDBSCAN for hard clustering (with outlier handling). Tokenization used scikit-learn CountVectorizer; topic representations used class-based TF-IDF (c-TF-IDF) with Maximal Marginal Relevance to reduce redundancy. Overlapping topics were merged with BERTopic’s merge_topics. Soft clustering then assigned outliers to the most probable topic (p<0.05 threshold; texts below remained outliers). Manual review: All classifications and outliers were manually checked; 183 of 1494 text segments were reassigned, yielding 87.8% accuracy by classifier evaluation criteria. An additional 42 segments received a second or third topic label. Topic co-occurrence: Presence/absence of topics per participant was correlated using Pearson correlations. Validation: A separate thematic analysis of the full dataset served as a benchmark to assess overlap with the data-driven model, noting greater concordance in well-defined areas (e.g., smoking, EBV, vaccinations) and less in vaguer domains (e.g., mental distress, trauma).
Key Findings
- Sample: 486 participants (mean age 52.15±12.48 years; 80.3% female); average cleaned entry length 27.46 words. Most had relapsing-remitting MS (n=339). - Topics: BERTopic identified 19 distinct theory topics; individuals mentioned 0–13 topics (median=2; mean=2.27, SD=1.54; later assigned mean=2.40, SD=1.60). - Most frequent topics: Mental Distress (31.5%), Stress (Exhaustion, Work) (29.8%), Heredity/Familial Aggregation (27.4%), Diet, Obesity (16.0%). - High-level categories and prevalence across participants: physical health (56.2%), mental health (53.7%), established scientific risk factors (genetics, EBV, smoking, vitamin D deficiency/low sunlight exposure; 47.7%), fate/coincidence (3.1%). - Topic co-occurrence (Pearson r): Diet, Obesity with Smoking & Alcohol (r=0.26); Stress (Exhaustion, Work) with Sleep Deprivation (r=0.24); EBV with Vitamin D Deficiency/Sunlight Exposure (r=0.21). - Classification quality: After soft clustering and manual review, 87.8% accuracy; 183/1494 segment reassignments; 42 segments received multiple topic labels. - Thematic validation: Good overlap for concrete, evidence-linked topics (smoking, alcohol/drugs, EBV, vaccinations); less for broader, psychosocial topics (mental distress, relationships, trauma). - Notable discrepancy with evidence: Frequent mentions of vaccinations as a risk despite contrary scientific consensus.
Discussion
Findings address the core question by showing that persons with MS attribute disease causation to a diverse set of factors, blending personal experiences with both evidence-based and non-evidence-based ideas. Mental health-related themes (stress, distress, trauma, relationship issues) were highly prevalent, underscoring the importance of psychosocial factors in patients’ explanatory models. Established scientific risk factors (genetics/family history, EBV, smoking, vitamin D/sunlight) were also commonly cited, reflecting alignment with public information and lived family experiences. The prominence of mental health themes suggests an unmet need for integrating mental health discussion and support within MS care, potentially improving coping, self-efficacy, and adherence. Discrepancies—particularly beliefs about vaccinations—have implications for public health messaging and clinical communication to prevent undermining prevention efforts. The results support engaging patients in evidence-informed dialogue, acknowledging their experiences while clarifying current science, consistent with the Health Belief Model and Cognitive Theory of Adaptation. The observed topic co-occurrences (e.g., diet with smoking/alcohol; work stress with sleep deprivation; EBV with vitamin D/sunlight) mirror intuitive patient narratives about multiple interacting factors and triggers versus underlying predispositions.
Conclusion
This study applies transformer-based topic modeling to characterize, at scale, the theories persons with MS hold about disease causation. Nineteen topics emerged, dominated by mental distress and work-related stress alongside heredity and diet/obesity, grouped into four overarching categories (physical health, mental health, established scientific risk factors, fate/coincidence). The work highlights the centrality of mental health in patients’ explanatory frameworks and the need for clear, empathetic communication about MS pathogenesis and evidence. Future research should examine how etiological beliefs relate to actual health behaviors, treatment adherence, and outcomes; investigate longitudinal changes in beliefs; and further integrate mental health considerations into MS risk and course research, including interventions to enhance self-efficacy and coping.
Limitations
- Retrospective, cross-sectional assessment: Recall and reconstruction biases likely influence how participants remember pre-onset factors and interpret causality. - Variable engagement with the topic: Some participants may have reflected on etiology for the first time; others had longstanding beliefs, leading to heterogeneous detail and conviction levels. - Potential self-selection: The survey may attract individuals with stronger views; respondents were more often female and slightly older than non-respondents, limiting representativeness. - Generalizability: Frequency estimates of specific theories may not generalize to the broader Swiss MS population or other settings. - Behavioral linkage not assessed: The study did not analyze how etiological beliefs translate into health behaviors or adherence. - Language/processing constraints: Most analyses conducted in German after translation; although reviewed, translation and NLP pipeline choices (embedding model, clustering thresholds, noun-focused lemmatization) may shape topic structures.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny