Computer Science
Citizen scientists—practices, observations, and experience
M. O'grady and E. Mangina
Citizen Science (CS) is increasingly viewed as a viable methodology for scientific research, either as a bottom-up initiative or as a collaboration with the professional scientific community, NGOs, or government organizations. Its importance is acknowledged in legislative contexts, for example, in the EU Open Science policy and the Crowdsourcing and Citizen Science Act in the USA. The importance of CS throughout history is undisputed—the era of professional science is a modern phenomenon. Traditionally, CS was often perceived as an exercise in data collection. However, citizen scientists have increasingly undertaken epistemic roles such as analysis and interpretation, with platforms like Zooniverse being exemplary. CS is sometimes seen as a vehicle for democratizing science, for which effective data stewardship is vital, though the concept of democratization has been challenged. A viable but underexplored objective for CS communities is collaboration with national and local government agencies to influence policy, and conversely, CS offers a tool for agencies to engage with local communities. Citizen scientists can contribute to monitorial, deliberative, and participatory models of democratic action by collecting, analyzing, and interpreting data for evidence-based policy, provided data quality and quantity are adequate. CS initiatives face challenges including trust and data quality, inclusivity, and polarization. Projects often know little about their participants, possibly due to hesitancy to collect personal data, yet understanding participant demographics, experiences, and data knowledge is essential for meaningful outcomes. This paper reports on a survey to obtain such an understanding, focusing on participant experience and data-related knowledge and practices.
There is no universally agreed definition of citizen science; one study identified 35 definitions, highlighting ambiguity that is problematic for policy but where a narrow definition risks excluding valid activities. A standardized international definition has been called for. For this work, CS is considered the pursuit of scientific knowledge undertaken or contributed to by those with no direct or indirect scientific role in their professional lives, while acknowledging related terms such as community science. Studies show growing interest in CS practice and volunteers. For example, in Greece, awareness of the term was relatively low but understanding of the concept was high, with concerns about data quality. Motivations to participate have been studied, and participant diversity is identified as urgently needed to maximize impact. While CS can empower marginalized communities, it may also risk reinforcing inequality unless contexts are considered. Surveys in Germany indicate projects often know little about their volunteers, which has implications for impact and cannot be decoupled from data quality. Effective, verifiable data collection and management are essential; prior work has surveyed the data lifecycle in CS and concluded common frameworks only partially fulfill CS needs. From a data justice perspective, citizen scientists may benefit less than professional scientists and governments, raising power-balance concerns and ethical decisions relating to open data practices and governance. Skepticism of CS-derived data persists; while CS can help measure SDGs, data quality is a major obstacle. Non-methodological factors such as emotional attachment can bias assessments (e.g., water quality). Communicating data management practices can help, and comprehensive quality control and assurance across the data lifecycle is advocated. The role of curated data reviewers in building trust is critical. Data quality is multifaceted and ambiguous; biases and limitations exist in all datasets, and a double standard toward CS data has been noted. Complementarity between professional and participatory data has been suggested, especially in protected areas. Proposed solutions include permissioned blockchain for data ownership and provenance and AI to improve data quality in mobile apps. There remains a need to understand participant expectations regarding data, open data, and intellectual property rights. A study found CS data often carry restrictive licensing, limiting reuse and impact, underscoring the need for awareness of data issues and robust, transparent data management policies. Earlier surveys include Wiggins et al. (2011) on data quality and validation mechanisms and an EU JRC survey (2016) on data management among citizen scientists. The present study complements and extends these by broadening demographic and project-specific details and deepening examination of participants' understanding of data, project data management, alignment with open science, awareness of FAIR principles, and training experiences.
A descriptive survey approach comprising two distinct but overlapping questionnaires was used, administered in two phases. Phase 1 targeted the citizen science community with a 47-question instrument designed primarily for quantitative analysis, covering demography; project characteristics; experience as a citizen scientist; data collection; data management; data dissemination; open research including Responsible Research and Innovation (RRI); and training received. Responses were mainly close-ended (e.g., Yes/No/I do not know), with a mix of single- and multiple-choice options; most questions were compulsory. The survey was built in Google Forms and translated into French, German, Greek, Italian, Polish, Portuguese, Spanish, and Turkish. Background information, study motivations, statements on non-collection of personal identifiable information (and discouraging provision of such), and non-collection of identifiable project details were provided. Participants consented prior to accessing the survey; data were stored only upon final submission. The survey required approximately 30 minutes. Recruitment leveraged online channels and fora for citizen scientists, including the European Citizen Science Association (ECSA) and Zooniverse. The survey was anonymous and uncompensated; a donation to UNICEF was made as a token of appreciation. Phase 2 targeted the general public to establish a baseline for comparison. It reused the Phase 1 questionnaire but focused on data concepts and training, excluding core CS-specific questions. The instrument was also implemented in Google Forms, with participants recruited via Prolific Academic Ltd. to control population characteristics (geography, gender balance). Participants were generally multilingual with English as a second language; the survey took about six minutes on average. As in Phase 1, participants were informed about the study, data use and sharing, and consent was obtained prior to commencement. Data processing: In Phase 1, 120 citizen scientists completed the survey; after rigorous quality checks, 100 consistent submissions were retained. The dataset was encoded and analyzed using Microsoft Excel. In Phase 2, 115 members of the public completed the survey; after quality checks, 108 submissions were retained.
Citizen scientist sample (n=100 after QC): Gender: 53% female, 45% male, 2% preferred not to say. Age: 18–65+, with 31% aged 35–44 and 13% aged 65+. Geography: 15 European countries represented; 5% from outside Europe. Roles: >50% identified as active citizen scientists; 40% identified as project leaders. Domains: Biodiversity, earth science, and environmental science accounted for almost 80% of projects. Geographic scope: neighborhood to continental; regional 32% and country 23% were most common. Duration: 1–4 years (44%) and >4 years (35%). Funding: international 32%, national 19%, and 13% unaware of funding source. Leadership: 46% led by academic institutions; 19% by NGOs. Collaboration: 55% collaborated with the project leader; 38% collaborated with people known only through CS activities. Decision-making: over 63% contributed to project management or decision-making. Motivations: conservation and nature protection (66%), education and learning (62%). Activities: data collection predominant for 85%; participants contributed across problem definition, analysis, and interpretation. Data collection tools: 45% used mobile apps; 19% used paper-based approaches. Consent: 20% could not recall how informed consent for data use was obtained. Data management knowledge: 26% unaware of a data management plan; 24% unaware of quality control processes; 26% unaware of metadata/documentation availability. Licensing: 43% unsure of the data license governing their project's data. Contact: 73% knew of a dedicated contact for data queries. Data dissemination: 37% reported that data were made publicly available as datasets, mainly post-processed (34%); 22% reported data were not publicly available. Open research awareness: mixed; open access and open data better known; open science awareness 54%; open innovation less encountered. Regulatory and principles awareness: GDPR awareness good; awareness of FAIR principles 37% and RRI 30% relatively low; participation in CS contributed to knowledge of these concepts. Training: participants received formal and informal training, especially on data collection protocols, analysis, and protection; training covered RRI aspects but was predominantly informal (except for data collection protocols) and shallow in ethics, gender, and legal topics. Repositories: good awareness of open data repositories; 55% accessed them outside CS and 38% within CS initiatives. Data sharing attitudes: generally positive except toward for-profit organizations. General public sample (n=108 after QC): Gender: 52% female, 47% male, 1% preferred not to say. Age: dominated by 25–34 (38%). Geography: 21 European countries; survey completed in English; all claimed proficiency (usually as a second language). Democratic models: citizen scientists were more aware of participative democracy (59%) than the public (42%). Awareness of CS: 88% of the public had not encountered the term “Citizen Science”; 56% had not encountered alternative models/synonyms. Citizen scientists were more familiar with “community science” (53%) and “participatory science” (51%). Preferred definition: National Geographic’s definition was most popular among both groups. Open research pillars: over half of the public aware of open access (63%) and open data (56%); overall awareness of all pillars greater among the CS community. Concepts and terms: public had high GDPR awareness (68%) but lower familiarity with FAIR, EOSC, and RRI compared with the CS community. Training: the public reported more formal and informal training in data protection, ethics, and legal issues; citizen scientists reported more training in public engagement, open science, and governance; the public reported more training in gender and ethics.
The study set out to surface the experiences and data-related knowledge and practices of citizen scientists, addressing concerns that participant voices are often underrepresented and that limited understanding of data stewardship may compromise consent, sharing, and overall impact. Findings show that citizen scientists predominantly engage in data collection but frequently lack clarity on critical elements of the data lifecycle, including data management plans, quality control, metadata availability, and especially licensing. This gap directly undermines informed consent and hinders data reuse and impact, affirming prior literature on CS data skepticism and restrictive licensing. While GDPR is well-known, awareness of Open Science as a holistic concept, FAIR principles, and RRI is modest, suggesting a need for targeted training to equip participants to make informed decisions and to align projects with open research best practices. The relatively positive attitudes toward data sharing (excluding for-profit contexts) and good awareness of repositories indicate a foundation upon which to build better open data practices. Demographically, the sample does not decisively confirm or refute common stereotypes, underscoring the ongoing need to improve inclusion and diversity to bolster credibility, representativeness, and policy relevance. The general public’s very low awareness of CS and its synonyms suggests projects are not effectively communicating their identity and goals, which may limit recruitment, legitimacy, and the potential of CS as a vehicle for democratic participation. Training patterns reveal emphasis on operational skills (collection, analysis) with insufficient depth in ethics, gender, legal issues, and open science, which are essential for responsible, equitable, and impactful CS. Overall, the results support enhancing RRI-aligned practices (public engagement, gender, governance, ethics), improving data governance and transparency across the lifecycle, and proactively promoting CS to the public to maximize scientific and policy impact.
Citizen science is increasingly embedded in modern scientific culture and offers opportunities to increase scientific literacy and contribute to democratic processes and policy formation. This study provides a snapshot of citizen scientists’ experiences, highlighting limited understanding of data management principles, mixed awareness of open research concepts, and training gaps in key RRI areas. Concrete recommendations include strengthening RRI (public engagement, gender, education, open science, ethics, governance), improving data management transparency (availability, licensing, access, sharing, quality control, informed consent), prioritizing diversity and inclusion, and increasing public awareness of CS and related models. With adequate support and appropriate training, citizen scientists and their projects can enhance their scientific contribution and societal impact. Future work should include larger, country-level replications across Europe to capture local conditions for a European strategy on CS in policy and governance; development of competence frameworks for training (e.g., akin to FabCitizen); and deeper qualitative inquiry, including phenomenological studies, to understand citizen scientist identities.
The study is limited by its population size, making findings indicative rather than definitive, though comparable to prior surveys in this field. As an online survey, it excluded individuals lacking computer literacy, potentially biasing the sample.
Related Publications
Explore these studies to deepen your understanding of the subject.

