logo
ResearchBunny Logo
BALANCING COMPLIANCE, CREATIVITY, AND ENGAGEMENT: EVALUATING THE IMPACT OF NSFW GOVERNANCE ON USER BEHAVIORS FOR GEN AI PLATFORMS

Computer Science

BALANCING COMPLIANCE, CREATIVITY, AND ENGAGEMENT: EVALUATING THE IMPACT OF NSFW GOVERNANCE ON USER BEHAVIORS FOR GEN AI PLATFORMS

J. Cao, K. Zhao, et al.

Generative AI's new NSFW guardrails can reshape both creativity and commerce. Using a natural experiment on a major text-to-image platform, this study finds stricter NSFW governance caused a 14% relative drop in subscriptions, reduced NSFW output without harming content diversity, and triggered a short-lived surge in NSFW attempts. Findings also show mobile users adapted less than PC users. Research conducted by Jisu Cao, Keran Zhao, Xiaowei Liu, Che-Wei Liu, and Jiang Duan.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper investigates how stricter NSFW (Not Safe for Work) governance on generative AI (Gen AI) platforms influences user behavior and platform outcomes in a co-creative, prompt-based environment. Unlike traditional social media, Gen AI content is co-produced by users and algorithms, complicating accountability and requiring proactive, system-level filtering rather than community-based moderation. Using Technology Affordance Theory (TAT), the authors frame stricter NSFW governance as a reduction in perceived affordances (action possibilities). They develop a three-stage framework—affordance existence, perception, and actualization—to explain user responses. The study focuses on three research questions: RQ1 examines how stricter NSFW governance affects user subscriptions (economic engagement); RQ2 examines effects on user compliance (reductions in NSFW prompts, improvements in appropriateness) and content diversity (linguistic novelty/entropy in prompts); and RQ3 examines temporal dynamics in compliance—short-run reactions versus long-run adaptation—after exposure to blocked outputs. The work is motivated by the need to balance safety, creativity, and business sustainability in Gen AI platforms and to extend TAT to co-creative systems where governance is embedded in the creation process.
Literature Review
The literature review situates the study within three streams: (1) Ethical and responsible use of Gen AI: While Gen AI enhances productivity and creativity, it raises concerns about misuse (e.g., NSFW, copyright, learning harms). Calls for safeguards and governance are increasing, yet user behavioral responses under ethical constraints remain underexplored. (2) Content moderation: Prior work contrasts human and algorithmic approaches. Algorithmic moderation scales but may reduce transparency and participation and induce gaming or behavioral shifts. Gen AI platforms differ by embedding proactive, real-time filtering during content creation, creating new tensions between safety and user autonomy. (3) Technology Affordance Theory (TAT): Affordances are relational possibilities for action arising from interactions between user goals and IT artifacts. Affordance perception is shaped by functional, symbolic, and external cues; users then actualize affordances by adapting or circumventing constraints over time. The authors extend TAT to co-creative systems and propose a three-stage framework (existence–perception–actualization) to theorize responses to tightened NSFW governance.
Methodology
Research context: A leading text-to-image Gen AI platform with a freemium model. New users get four free credits; upon exhaustion, they can start a 7-day free trial by linking a payment method (subscription), after which automatic billing occurs unless canceled (paid subscription). The platform initially used an open-source NSFW detector for post-generation filtering; on March 29, 2024, it silently added an external, higher-accuracy NSFW filter, creating a natural experiment. Blocking notifications informed users that sensitive content could not be displayed. Dataset: Users who registered between March 1 and April 25, 2024, and generated at least once. Two samples: (a) full sample for RQ1/H1 (N=88,427 users, pre vs post), and (b) retained users for RQ2–RQ3/H2–H3 (1,998 users registered pre-policy and active post-policy; 34,575 user-prompt observations). Measures: - Dependent variables: Subscription (binary: linked payment after free credits), Paid Subscription (binary: charged after trial); Prompt Violation (binary keyword match against platform-compiled NSFW dictionary); Neutral Score (0–1 probability from a deep learning NSFW image detector indicating appropriateness); Content Diversity (Shannon entropy of prompt text). - Treatment definition for DiD (H1, H2): Users are treated if any of their initial four-credit prompts contained NSFW terms; Post equals 1 on/after March 29, 2024. Pre-policy blocking rate was 1.7%, rising to 6.0% post-policy. - H3 immediacy and accumulation: Lag Block (previous image blocked, binary) for immediate reaction; Lag Cum Block (average user-level block rate up to prior event) for long-run adaptation. Controls: Prompt Length (log), Mobile (vs PC), Registration Type: Account Attached (linked to external identity), Referral Source (organic vs promotional), Platform Engagement (usage intensity for H2/H3). Identification and models: Difference-in-differences. For H1 (subscriptions), linear probability models with controls, location and week fixed effects; probit used in robustness. For H2 (compliance/diversity), user fixed-effects DiD at prompt level with week fixed effects. For H3 (dynamics), panel regressions with user and week fixed effects regressing Prompt Violation or Neutral Score on Lag Block and Lag Cum Block, plus controls. Robustness: Event-study parallel trends, propensity score matching DiD, probit specifications, placebo pseudo-treatments, and alternative measures (e.g., Prompt Toxicity via Perspective API; Prompt Length and Image Aesthetic Score for diversity; alternative treatment via toxicity threshold).
Key Findings
- Subscription impact (H1/RQ1): Stricter NSFW governance reduced subscriptions among users with NSFW intent. Treatment × Post reduced Subscription by 0.007 (from 5.0% to 4.3%, a 14% relative decline; p<0.01) and Paid Subscription by 0.006 (p<0.05). - Compliance (H2a/RQ2): Among retained users, Treatment × Post decreased Prompt Violation by 0.180 (p<0.01) and increased Neutral Score by 0.064 (p<0.05), indicating improved appropriateness. - Content diversity (H2b/RQ2): No significant effect on Content Diversity (β = -0.040, p>0.1), suggesting creative diversity was not measurably reduced. - Short-run vs long-run dynamics (H3/RQ3): Immediate reaction after a block increased Prompt Violation (Lag Block β = 0.250, p<0.01) and decreased Neutral Score (β = -0.027, p<0.01), indicating boundary testing. Accumulated exposure reduced violations (Lag Cum Block β = -0.621, p<0.01) and increased Neutral Score (β = 0.400, p<0.01), evidencing internalization and routinized compliance. - Heterogeneity (post-hoc): Mobile users exhibited weaker compliance (Treatment × Post × Mobile: Prompt Violation β = 0.330, p<0.01; Neutral Score β = -0.155, p<0.05). Linked-account users exhibited stronger compliance (Treatment × Post × Registration Type: Account Attached: Prompt Violation β = -0.368, p<0.05; Neutral Score β = 0.148, p<0.1). - Manipulation check: Blocking rate rose from 1.7% pre-policy to 6.0% post-policy, consistent with tighter governance. Robustness checks (parallel trends, PSM, probit, placebo, alternative measures and treatment definitions) supported all main conclusions.
Discussion
The findings address the research questions by showing that stricter NSFW governance, conceptualized as a reduction in perceived affordances, leads some NSFW-oriented users to disengage economically (lower subscriptions), while others adapt their behavior to comply. Users who remain reduce NSFW prompting and generate more appropriate images without a detectable loss in prompt diversity, suggesting that proactive, embedded moderation can curtail harmful content without necessarily constraining creative range. Temporal analyses reveal a two-phase process of affordance actualization: short-run boundary testing after blocks is followed by long-run internalization and routinized compliance as cumulative exposure grows. Context matters: compliance responses are weaker on mobile (possibly due to session context and salience of constraints) and stronger when accounts are linked to external identities (heightened reputational salience). The study extends TAT to co-creative Gen AI systems, evidencing the existence–perception–actualization progression and the role of functional and symbolic cues in shaping behavior over time.
Conclusion
This study quantifies the trade-offs of stricter NSFW governance on a large Gen AI platform. Stricter filtering decreases subscription rates among NSFW-oriented users but increases compliance among those who remain, without significantly reducing content diversity. Behavior unfolds over time: immediate boundary testing gives way to long-term adaptation as users internalize constraints. The work contributes by (1) documenting the economic costs and compliance benefits of proactive NSFW governance, (2) showing that embedded moderation can improve safety without harming diversity, (3) advancing a sociotechnical view of ethical AI where outcomes are co-produced by users, feedback, and design, and (4) extending TAT to co-creative systems via a three-stage framework. Managerially, platforms should calibrate governance to balance safety and engagement, provide constructive but non-gameable feedback to guide prompt reformulation, and incorporate contextual signals (device, identity linkage) into governance design. Future research should explore demographic heterogeneity, cross-cultural contexts, and multilingual settings to generalize and refine governance strategies.
Limitations
- Lack of demographic attributes (e.g., age, gender, education, occupation) limits analysis of heterogeneous responses across user groups. - Geographic scope limited to users in an English-speaking country; cultural and regulatory differences elsewhere (e.g., EU, parts of Asia) may yield different responses. - Proprietary constraints limit disclosure of exact filtering algorithms and some operational details. - Content diversity measured via Shannon entropy may not capture all facets of creativity; alternative creativity measures were used in robustness but remain proxies.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny