logo
ResearchBunny Logo
DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data

Biology

DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data

B. Ranjan, W. Sun, et al.

Discover DUBStepR, a groundbreaking feature selection algorithm that enhances the accuracy of single-cell data clustering by utilizing gene-gene correlations. This innovative research, conducted by renowned authors, outperforms existing methods and expertly deconvolves cell heterogeneity in rheumatoid arthritis patient data. Its scalability makes it an essential tool for analyzing large datasets across various data types.... show more
Abstract
Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. Additionally, DUBStepR was the only method to robustly deconvolve T and NK heterogeneity by identifying disease-associated common and rare cell types and subtypes in PBMCs from rheumatoid arthritis patients. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.
Publisher
NATURE COMMUNICATIONS
Published On
Oct 06, 2021
Authors
Bobby Ranjan, Wenjie Sun, Jinyu Park, Kunal Mishra, Florian Schmidt, Ronald Xie, Fatemeh Alipour, Vipul Singhal, Ignasius Joanito, Mohammad Amin Honardoost, Jacy Mei Yun Yong, Ee Tzun Koh, Khai Pang Leong, Nirmala Arul Rayan, Michelle Gek Liang Lim, Shyam Prabhakar
Tags
feature selection
single-cell data
gene-gene correlations
DUBStepR
clustering
rheumatoid arthritis
scalability
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny