logo
ResearchBunny Logo
Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

Medicine and Health

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

S. Bej, J. Sarkar, et al.

This innovative study explores diverse sub-populations within Type-2 Diabetes Mellitus in India, revealing unexpected insights about non-obese individuals and their dietary habits. Conducted by Saptrash Bej, Jit Sarkar, Saikat Biswas, Pabitra Mitra, Partha Chakrabarti, and Olaf Wolkenhauer, this research calls for a reevaluation of T2DM screening criteria in rural areas.

00:00
00:00
~3 min • Beginner • English
Abstract
Background: Type-2 Diabetes Mellitus (T2DM) exhibits heterogeneous sub-populations, but identification of sub-populations within epidemiological datasets is underexplored. This study focuses on detecting T2DM clusters in the Indian National Family Health Survey-4 (NFHS-4) dataset containing 10,125 T2DM patients with diverse features spanning medical history, diet, addictions, socio-economic, and lifestyle factors. Methods: Conventional UMAP for dimensionality reduction performed poorly due to mixed data types. The authors implemented a feature-type-distributed workflow applying UMAP separately to continuous, ordinal, and nominal features using appropriate similarity metrics (Euclidean for continuous, Canberra for ordinal, Hamming for nominal). The resulting low-dimensional embeddings (2D for continuous and ordinal; 1D for nominal) were integrated into a five-dimensional representation and clustered using DBSCAN. Results: Seven clusters (with 261 outliers) were detected; four significant clusters contained 2898, 2301, 2226, and 1315 individuals. Two clusters were predominantly non-obese, characterized by lower mean age and BMI and a higher proportion of rural residents and lower wealth status. One cluster showed very low non-vegetarian intake (about 90% reporting no egg, fish, chicken/meat intake). Conclusions: Feature-type-distributed UMAP clustering is effective for heterogeneous epidemiological data. The presence of significant non-obese T2DM sub-populations with younger age and economic disadvantage highlights the need for different T2DM screening criteria among rural inhabitants and tailored dietary guidance.
Publisher
Nutrition and Diabetes
Published On
Dec 27, 2022
Authors
Saptrash Bej, Jit Sarkar, Saikat Biswas, Pabitra Mitra, Partha Chakrabarti, Olaf Wolkenhauer
Tags
Type-2 Diabetes Mellitus
T2DM
National Family Health Survey
clustering
epidemiology
dietary habits
rural health
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny