Computer Science
Exploring excitement counterbalanced by concerns towards AI technology using a descriptive-prescriptive data processing method
S. Oprea and A. Bâra
This research conducted by Simona-Vasilica Oprea and Adela Bâra delves into how the public perceives AI technologies, such as facial recognition and driverless cars. Through a hybrid data analysis approach, distinct clusters of excitement and concern were identified, providing valuable insights for informed policy discussions on AI adoption.
~3 min • Beginner • English
Introduction
The study addresses how the public perceives AI technologies that are increasingly embedded in daily life, including social media content moderation, facial recognition, and driverless cars. Public attitudes balance optimism about societal benefits with concerns around autonomy, job loss, unintended consequences, and surveillance. Leveraging a Pew Research Center (PRC) survey (Feb 2021) on three debated AI developments—facial recognition by law enforcement, social media algorithms to detect misinformation, and driverless passenger vehicles—the paper explores acceptance, fairness perceptions, and how these relate to demographics (income, ideology, political affiliation). The research question is: How do different groups perceive AI technologies in these domains, what are their primary concerns, and how can these perceptions be predicted for new respondents? The purpose is to provide actionable insights for policy and governance by profiling respondent groups and forecasting cluster membership for new cases.
Literature Review
Prior work on AI acceptance spans multiple sectors and methods. Key examples include: manufacturing in Malaysia (n=93) identifying lack of talent and in-house expertise as barriers; accounting students (n=824) where technology readiness impacts AI adoption mediated by perceived ease of use (PEOU) and perceived usefulness (PU); radiology patients (Netherlands) using EFA identified five factors (distrust/accountability, procedural knowledge, personal interaction, efficiency, being informed); medical students in Malaysia showed readiness correlated with age, academic year, and prior training; NLP pipelines for patient experience analysis achieved F1=0.97 for positive and 0.63 for negative sentiment; HR managers (India) found adoption driven by cost-effectiveness, relative advantage, top management support, HR readiness, competitive pressure, vendor support, with privacy/security concerns as deterrents; UAE energy sector emphasized business innovation alignment and usability; banking customers (Malaysia, n=302) showed intention driven by attitude, PU, perceived risk, trust, and norms; knowledge sharing mediates digital tech’s effect on AI adoption in Saudi manufacturing; AI-enabled hospitality services identified anthropomorphic, entertainment, functional, information attributes; librarians in North America expected AI to change library functions and desired training; AI curriculum co-creation improved student competence and attitudes; acceptance of AI teachers influenced by robot use anxiety, PU, PEOU, and task difficulty; autonomous vehicles studies explored ethics, user traits, and behavioral determinants (PEOU, norms, control, environmental/technology attitudes); airline passengers during COVID-19 were favorable to several digital services though less so to facial recognition; broader technical literature treated autonomous driving safety, federated learning, traffic management, and policy. Across studies, sample sizes range from under 100 to ~5400, and methods include descriptive statistics, regression, SEM/PLS-SEM, EFA/CFA, ANOVA/chi-square, NLP, and mixed methods, reflecting methodological diversity tailored to sectoral questions.
Methodology
A descriptive-prescriptive hybrid pipeline processes survey data to uncover segments and predict group membership. Steps: 1) Exploratory Data Analysis: assess missing values and data integrity, visualize distributions, cross-tabulate demographics (education, age, gender, race/ethnicity, religion, ideology, income) with key attitudes (technology impact, concern vs. excitement, algorithm fairness). 2) Clustering: remove identifier/string columns (e.g., QKEY, SMALG3_W99, F_REG, F_INTFREQ), standardize features, determine K via quality metrics (silhouette), and run K-means. 3) Dimensionality reduction: PCA for 2D/3D visualization of cluster separation. 4) Cluster validation and interpretation: one- and two-way ANOVA across clusters to test mean differences; Tukey’s HSD for post-hoc comparisons; compute correlations within clusters to understand feature interrelations. 5) Prediction: train a Random Forest classifier to predict cluster labels for new respondents; split data 70% train, 15% test, 15% out-of-sample validation; evaluate via Accuracy, Precision/Recall/F1, and AUC. Algorithmic background for K-means, ANOVA, and classification is provided, with emphasis on avoiding overfitting and ensuring generalization. Data source: PRC survey fielded Nov 1–7, 2021; 11,492 invited, 10,260 completes (89% completion), overall panel response rate ~3% and MoE ±1.6 points for full sample. Analysis focuses on the technological subset of 5,153 records with 126 variables covering technology impact, concern/excitement, algorithmic fairness, discrimination, facial recognition, social media, driverless cars, and demographics. Data quality checks indicated no missing values or duplicates in the working subset.
Key Findings
Sample and descriptive insights: • From 10,260 completed PRC surveys, 5,153 technology-related records were analyzed across 126 variables. • Technology impact (TECH1_W99): Mostly positive 2,627; equally positive and negative 2,036; mostly negative 482. • Concern vs. excitement (CNCEXC_W99): Equally concerned and excited 2,267; more concerned than excited 1,981; more excited than concerned 889. • Algorithm fairness (ALGFAIR_W99): Not sure 2,046; Not possible 1,580; Possible 1,480. • Political affiliation (F_PARTYSUM_FINAL): Dem/Lean Dem 2,589; Rep/Lean Rep 2,445; other/refused 119. • Income (F_INC_SDT1): $100k+ 1,427; < $30k 812; others spread; 244 refused. • Ideology (F_IDEO): Moderate 1,853; Conservative 1,344; Liberal 913; Very conservative 541; Very liberal 424. • Age (F_AGECAT): 30–49 1,642; 65+ 1,527; 50–64 1,509; 18–29 457. • Gender: Women 2,840; Men 2,270; Other/Non-binary 31; Refused 12. • Racial/ethnic composition: White 4,072; Black 436; Mixed 186; Asian 180; Other 177; Refused 102. Thematic toplines from PRC items: • Facial recognition by police monitoring crowds: 46% good for society, 27% bad, 27% unsure. • Social media algorithms to detect misinformation: 38% good, 31% bad. • Concern/excitement balance on AI in daily life: 45% equally both, 37% more concerned, 18% more excited. • Top concerns: job loss, privacy/surveillance. Cross-tabulation patterns: • Education: College+ more likely to see tech as mostly positive (e.g., College+ 1,527 positive); higher uncertainty and skepticism about algorithm fairness among more educated groups (e.g., College+ Not sure 976; Not possible 839). • Age: Younger (18–29) more positive and more excited; concern increases with age, with 50–64 and 65+ more often more concerned than excited; skepticism about algorithmic fairness persists across all ages. • Gender: Women slightly more likely to view tech impact as mostly negative and to be more concerned than excited; more women than men say fair algorithms are not possible. • Race/ethnicity: Asian respondents most likely to view tech as mostly positive; Black respondents more likely more concerned than excited; White respondents show high skepticism about algorithmic fairness. • Religion/ideology: Atheists/Agnostics more positive; Conservatives and Very Conservatives more skeptical about algorithmic fairness; Liberals/Very Liberals more positive about tech’s impact. • Income: Higher income more positive and more excited; lower income more concerned; skepticism about algorithmic fairness present across tiers, strongest in middle/upper tiers by count. Clustering and validation: • K-means with 3 clusters yielded strong separation (silhouette ≈ 0.828). Distribution (3 clusters): Cluster 1 = 5,069; Cluster 0 = 72; Cluster 2 = 12. • Profiles: – Cluster 1 Baseline/Skeptics: lowest means across features; less engaged or less intense opinions. – Cluster 0 Moderates: mid-level means; balanced views; favor measured regulation. – Cluster 2 Intensives: highest means across features (e.g., TECH1_W99, CNCEXC_W99, ALGFAIR_W99, DISCRIM1_a_W99, FACEREC3_c_W99); strong engagement, pronounced concern or enthusiasm. • Key differentiating features with large between-cluster variance: DCARS11_a/b/c (driverless car standards/conditions), FACEREC6_a/b (standards setting for facial recognition), FACEREC3_c (racial bias concerns), SMALG4_d and SMALG12 (social media misinformation handling), POSNEGHE_e (human enhancement attitude). • ANOVA across clusters showed significant mean differences for most features (p < 0.05), supporting distinct cluster separation. Selected inferential tests: • Gender vs DISCRIM1_a_W99: F ≈ 6.017, p ≈ 0.000439 (significant differences in perceived discrimination across genders). • Ideology (F_IDEO) vs TECH1_W99: F ≈ 15.674, p ≈ 9.43e-13 (significant ideological differences in tech impact views). Tukey HSD indicates a gradient from Very Conservative to Very Liberal toward more positive tech views. • Two-way analyses involving CNCEXC_W99 with income tiers and ideology: F ≈ 4.95 (p ≈ 3.63e-24) for income bands with ideology; F ≈ 13.36 (p ≈ 1.13e-31) for tiered income with ideology (significant joint effects). Predictive modeling: • Random Forest cluster prediction: Test set Accuracy 99.48%, F1 0.994, AUC 0.9998; Out-of-sample Accuracy 99.09%, F1 0.989, AUC 0.9988. Alternative models (linear, decision tree) underperformed (F1 ~0.87–0.89).
Discussion
Findings address the research question by revealing distinct public segments with different intensities of engagement and concern regarding AI in social media, facial recognition, and driverless cars, and by demonstrating that cluster membership can be accurately predicted for new respondents. The Baseline group tends to be less engaged and more skeptical; Moderates hold balanced views favoring measured governance; Intensives display strong enthusiasm and/or concern, especially about fairness, discrimination, and regulation. Demographic patterns clarify which populations tend to be more positive (younger, higher income, Atheist/Agnostic, Liberal) and which are more concerned (older, lower income, women, Black respondents, Conservatives). Education correlates with both more positive tech impact views and greater skepticism about algorithmic fairness, implying higher awareness of ethical risks. Significant ANOVA results confirm ideology and gender differences in perceptions, and joint effects of income and ideology on concern vs excitement. These insights are relevant for policy, governance, and communication: for example, addressing job displacement and privacy can mitigate concerns in more cautious segments, while transparency and fairness measures can build trust across all groups. The predictive component enables proactive tailoring of outreach and policy interventions to audience profiles, potentially improving AI acceptance and responsible adoption.
Conclusion
The paper contributes a descriptive-predictive pipeline for large survey analysis that integrates EDA, cross-tabulation, K-means clustering, PCA visualization, ANOVA with post-hoc tests, and Random Forest prediction of cluster membership. Applied to a 5,153-respondent PRC dataset, it uncovers three distinct respondent profiles and quantifies demographic and attitudinal determinants of AI acceptance and concern. Practical implications include: 1) targeted communication tailored to cluster profiles; 2) policy development prioritizing privacy, fairness, and job transition protections; 3) education and outreach programs scaled to engagement levels; 4) market segmentation for AI products emphasizing privacy features or usability depending on segment. Future research directions: expand beyond a U.S.-only sample to improve generalizability; periodically repeat the survey to monitor trends; evaluate additional classifiers (e.g., gradient boosting) and model interpretability tools; refine feature engineering for mixed data; and deepen causal inference on determinants of AI acceptance.
Limitations
The study analyzes a U.S.-only survey subset, which may limit generalizability to other cultural and regulatory contexts. While the U.S. is a leading AI market, global attitudes may differ; future studies should incorporate multi-country samples. As with any self-reported survey, response and sampling biases may influence results, though PRC’s methodology and preprocessing mitigate some risks. Model-wise, Random Forests can overfit; performance was validated on test and out-of-sample splits, but external validation with new cohorts is warranted. Finally, clustering interpretations rely on encoded survey responses and assumed ordinal directions; alternative encodings or mixed-type clustering could yield different segmentations.
Related Publications
Explore these studies to deepen your understanding of the subject.

