Computer Science

"I Don't Know If We're Doing Good. I Don't Know If We're Doing Bad": Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products

H. (. Lee, L. Gao, et al.

Practitioners building consumer AI often treat privacy as actions against known intrusions, rely on compliance, and remain unaware of AI-specific harms—leading to rigid, demotivated privacy work and inadequate tools. Research conducted by Authors present in <Authors> tag interviewed 35 industry AI practitioners and highlights the urgent need for better awareness, motivation, and tooling to address AI-exacerbated privacy risks.... show more

Introduction

Privacy is a core principle of human-centered AI, yet little is known about how industry practitioners who design consumer-facing AI technologies define and scope privacy, what motivates or inhibits their privacy work, and which methods, tools, and resources they use. Prior work highlights a gap between principles and practice in HAI. Because AI technologies pose unique privacy harms and AI development pipelines differ from traditional software engineering, the privacy challenges in developing consumer-facing AI products are expected to differ. Guided by the Security and Privacy Acceptance Framework (SPAF), which identifies three barriers—awareness, motivation, and ability—the authors pose three research questions: RQ1: How well do AI practitioners’ definitions of privacy work reflect awareness of AI-exacerbated privacy threats? RQ2: What motivates and inhibits privacy work for consumer-facing AI products? RQ3: What constitutes privacy work for AI practitioners and what affects their ability to do this work? To investigate, the authors conducted semi-structured interviews with N=35 industry practitioners across 25 companies who worked on consumer-facing AI products (i.e., products that train on or infer from end-user data).

Literature Review

Human-Centered AI: Prior research shows AI can produce intrusive or unjust outcomes in high-stakes contexts, motivating efforts to create socially responsible, trustworthy, and safe interactive AI systems. Privacy is frequently cited in ethical AI guidelines, yet a principle–practice gap persists. Efforts to bridge this gap include modeling practitioner difficulties, publishing educational repositories, regulations, and creating artifacts such as checklists, guidelines, data statements, and dataset nutrition labels. However, privacy in AI practice is under-modeled and often treated narrowly (data protection, consent, technical guarantees like differential privacy), lacking comprehensive practitioner-oriented support. Privacy in software engineering: Studies of developer attitudes via surveys, interviews, and public forums show privacy is often a secondary concern due to costs, misconceptions, knowledge gaps, and lack of clear guidelines and regulations. Tools include issue detection, agendas/workbooks/guidelines, and investigations into usability of these interventions (e.g., static analysis notifications). SPAF identifies awareness, motivation, and ability barriers to adopting privacy/security best practices. The authors extend this literature to consumer AI products, which introduce unique privacy risks and differ in design/deployment pipelines, offering first in-depth insights into how practitioners define privacy, what motivates/inhibits their work, and how they operationalize privacy.

Methodology

Study design: Semi-structured interviews focusing on practitioners’ experiences with privacy work for a specific consumer-facing AI product. Consumer-facing AI products were defined as those training on or making inferences from end-user data. Sample: 31 interview sessions (30 individual, 1 group) with 35 practitioners from 25 companies (20 large technology companies and five startups). One 90-minute group interview (P20–P24); individual sessions lasted 40–60 minutes. Interviews conducted remotely; compensation was $100 USD per participant. Languages and conduct: Interviews in English (n=23) and Chinese (n=8). The first author conducted 27/31 interviews; others conducted two English and two Chinese interviews. When possible, a second interviewer joined to take notes and ask follow-up questions. Consent and IRB approvals were obtained (CMU and Georgia Tech IRBs). Audio/video recorded with consent; recordings were transcribed de-identified and deleted post-transcription. Recruitment: Practitioners with experience designing/developing consumer-facing AI and participating in privacy discussions about such products. Recruitment via professional networks and social media (9/35), alumni networks (18/35), and direct contacts (8/35). Participants: Roles included researcher (n=16), software engineer (n=13), designer (n=13), among others. Ages ranged 23–48 (M=31.82, SD=6.71); 16 male, 14 female, 5 undisclosed. Common AI technologies: recommender systems (n=14), conversational AI/chatbots (n=10), NLP tools (n=10), predictive analytics (n=10). Common domains: healthcare (n=8), general-purpose ML tools (n=8), media/entertainment (n=7). Interview protocol: Questions aligned to SPAF barriers. For RQ1, asked how privacy was defined/scoped for the product; analyzed these definitions against AI-exacerbated harms from prior literature (e.g., memorization leaks, membership inference). For RQ2, asked about motivations and inhibitors; synthesized factors affecting motivation. For RQ3, asked about actions, tools, artifacts, resources, challenges, and envisioned helpful tools; analyzed ability barriers. Data analysis: Iterative open coding. The first author coded ten transcripts and developed a codebook with three co-authors. A second coder was trained; both coded six interviews independently with full agreement, then split remaining interviews, meeting regularly to review and resolve disagreements. All authors met regularly to discuss emerging themes. The finalized codebook is provided in the appendix.

Key Findings

RQ1 (Awareness): Practitioners viewed privacy as protecting users against predefined intrusions in data collection/processing, largely generic and not AI-specific. Mapped to Solove’s taxonomy, counts of concerns were:

Surveillance (3/35): AI-enabled monitoring and large-scale data collection create surveillance infrastructure; tension between utility and intrusiveness.
Identification (10/35): Presence of PII in ML pipelines and AI’s inferential capabilities increase re-identification risks; mitigations included manual de-identification and aggregate analysis.
Exclusion (4/35): Limited user awareness/agency about AI use of personal data; some teams offered deletion upon request or at end of service.
Secondary use (2/35): Reuse of data to train new models without consent; some participants emphasized strict purpose limitation.
Insecurity (14/35): Poor operational security leading to leaks/unauthorized access; emphasis on access controls, secure storage, and retention policies. Participants rarely mentioned AI-specific threats highlighted in prior literature (e.g., memorization leaks, membership inference, transfer learning risks), indicating limited awareness of AI-exacerbated threats.

RQ2 (Motivation vs. Inhibition): Motivators:

Alignment with business interests (9/35): Privacy as competitive differentiator; addressing client concerns.
Social responsibility (5/35): Personal/organizational desire to build responsible AI; avoiding unethical applications.
Compliance with regulation/policy (19/35): GDPR, CCPA, Canadian frameworks; external privacy reviews as catalysts. Inhibitors:
Rigid compliance requirements (6/35): Privacy equated to minimum compliance; inhibited broader, human-centered approaches.
Incentives (7/35): Advocacy adds effort and may slow promotions; speed-focused performance metrics deprioritize privacy.
Power (3/35): Organizational structures left individual contributors feeling powerless.
Privacy education (6/35): Low company-wide visibility and limited training led to misunderstandings and deprioritization.
External ownership of privacy (5/35): Privacy relegated to legal/privacy teams, reducing engagement by developers/designers.
Opportunity costs/trade-offs: Most cited inhibitor. Specific trade-offs included:
- Functionality/UX prioritized over privacy (9/35).
- Business objectives (e.g., advertising insights) dependent on tracking (7/35).
- Innovation slowed by conservative privacy stances (3/35).
- Model performance trade-offs (7/35): Coarse-grained data (3/35) or less data (4/35) degraded performance.
- Additional engineering costs and bottlenecks: Complex pipelines, extended reviews and approvals (7/35). Overall, practitioners exhibited low motivation beyond compliance; inhibitors outweighed motivators.

RQ3 (Ability): Tools and resources were generally non-product and non-AI-specific, limiting ability to address AI-specific privacy risks. Reported supports:

Privacy training (18/35): Mandatory, generic company-wide training; participants desired AI/product-specific education.
Design references (9/35) and privacy/legal consultations (11/35): On-demand internal documentation and expert reviews; helpful but reliant on practitioners’ ability to know what to ask.
Developer tools (3/35): Automated code audits (e.g., Azure DevOps flags), prompts about user data usage.
Privacy checklists/forms (3/35): Risk assessment and third-party policy reviews. Expressed needs: Product- and AI-specific guidance, standards, and checklists (e.g., to certify data exclusions like location/email; end-to-end QA alignment). Ability barriers included:
Lacking a holistic view of complex, multi-model data pipelines (4/35), making downstream privacy implications hard to assess.
Lacking clear guidance (9/35): Regulations are complex; AI compliance practices are nascent; blanket requirements insufficient for AI contexts; difficulty evaluating effectiveness; industry practices opaque. Quote exemplifying uncertainty: “I don’t know if we’re doing good. I don’t know if we’re doing bad... I’d have no clue.”

Discussion

Findings indicate limited awareness of AI-exacerbated threats, low motivation beyond compliance due to inhibitors, and constrained ability stemming from generic tools and opaque practices. To address SPAF barriers, the authors propose:

Awareness: AI-specific privacy education campaigns mapping AI capabilities/requirements to privacy risks; structured red-teaming/simulated attacks to surface downstream harms and intrusions.
Motivation: Pro-social design via shared repositories of AI privacy best practices with before/after exemplars and social proof; transformational educational games in training to make abstract privacy benefits concrete and salient.
Ability: AI-specific privacy developer tools, checklists, value cards, and impact assessments tailored to product contexts, bridging process-based compliance and last-mile guidance. Integrative, human-centered design processes should jointly address awareness, motivation, and ability by exploring utility vs. intrusiveness trade-offs early (e.g., storyboards, low-fidelity prototypes), engaging stakeholders to weigh privacy against model performance and product objectives, and making compliance fluid and process-based.

Conclusion

Through interviews with 35 industry practitioners building consumer-facing AI products, the study models how practitioners define and scope privacy work, what motivates or inhibits it, and what affects their ability to perform it. Practitioners showed limited awareness of AI-specific privacy risks, faced more inhibitors than motivators beyond minimum compliance, and relied on non-AI-specific tools. While compliance helped prioritize privacy, a compliance-centered approach often inhibited formative, human-centered design explorations. The authors call for turnkey, AI-specific design tools and artifacts that address SPAF’s awareness, motivation, and ability barriers, enabling practitioners to better recognize, value, and mitigate privacy harms introduced or exacerbated by AI.

Limitations

Qualitative findings based on participants’ experiences are not representative of all AI practitioners or industry contexts. Participants were not primed on AI-specific intrusions or practices, potentially affecting responses. Recruiting practitioners with privacy experience may skew toward higher awareness. The sample largely represents North American and European companies due to recruitment strategies. Institutional privacy practices can be intentionally opaque; while participants came from diverse organizations and products, specifics cannot be disclosed. Interview languages included English and Chinese; bilingual coders analyzed transcripts without translation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Business

Why and when does multitasking impair flow and subjective performance? A daily diary study on the role of task appraisals and work engagement

H. Pluut, M. Darouei, et al.

Computer Science

When combinations of humans and AI are useful: A systematic review and meta-analysis

M. Vaccaro, A. Almaatouq, et al.

Business

How do cue utilization and value co-creation and future orientation affect the consumers’ choices of smart agricultural products?

Y. Zheng and D. Cao

Computer Science

When combinations of humans and AI are useful: A systematic review and meta-analysis

M. Vaccaro, A. Almaatouq, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny