logo
Loading...
Accessible data curation and analytics for international-scale citizen science datasets
Computer ScienceScientific Data

Accessible data curation and analytics for international-scale citizen science datasets

B. Murray, E. Kerfoot, et al.

The Covid Symptom Study has led to the creation of ExeTera, an open-source Python software package that facilitates data analytics for terabyte-scale datasets. This innovative tool addresses data curation challenges and enhances international collaborations through scalable analysis, empowering reproducible research. This groundbreaking work was conducted by a team of researchers including Benjamin Murray, Eric Kerfoot, and Andrew T. Chan, among others.... show more
Abstract
The Covid Symptom Study (CSS) is a smartphone-based citizen science project that, by May 23, 2021, amassed over 360 million self-assessments from more than 5 million participants. Its scale poses major curation challenges: standard Python tools (e.g., Pandas) cannot readily process such data on commodity hardware, while alternative scalable technologies increase technical complexity. The authors present ExeTera, an open-source Python package that delivers Pandas-like analytics for datasets approaching terabyte scale. ExeTera’s design enables scalable, reproducible curation and analysis by emphasizing columnar storage, streaming operations, and sorted-key algorithms. They demonstrate ExeTera’s role as a core component of an end-to-end curation pipeline for CSS, enabling reproducible research across an international collaboration.
Publisher
Scientific Data
Published On
Nov 22, 2021
Authors
Benjamin Murray, Eric Kerfoot, Liyuan Chen, Jie Deng, Mark S. Graham, Carole H. Sudre, Erika Molteni, Liane S. Canas, Michela Antonelli, Kerstin Klaser, Alessia Visconti, Alexander Hammers, Andrew T. Chan, Paul W. Franks, Richard Davies, Jonathan Wolf, Tim D. Spector, Claire J. Steves, Marc Modat, Sebastien Ourselin
Tags
Covid Symptom StudyExeTeradata analyticsPythonreproducible researchcollaborationdatasets
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny
    Accessible data curation and analytics for | ResearchBunny