
Business
Winners and losers of generative AI: Early Evidence of Shifts in Freelancer Demand
O. Teutloff, J. Einsiedler, et al.
This paper, authored by Ole Teutloff, Johanna Einsiedler, Otto Kässi, Fabian Braesemann, Pamela Mishkin, and R. Maria del Rio-Chanona, reveals how ChatGPT is reshaping the landscape of freelancing. Discover how demand is shifting away from substitutable skills like writing and translation, while the need for specialized expertise in areas like machine learning is skyrocketing. Join us as we explore these pivotal changes in the job market.
~3 min • Beginner • English
Introduction
The paper investigates how the release of ChatGPT affected labor demand in online freelancing markets across tasks that large language models (LLMs) can substitute or complement. Motivated by mixed evidence on LLMs reducing demand for tasks like writing/translation while boosting productivity in complementary settings, the authors ask whether ChatGPT’s launch led to observable shifts in demand at a fine-grained skill level. They highlight two challenges: evolving platform skill taxonomies and the infeasibility of manually labeling millions of postings for AI exposure. Addressing these, they cluster postings into skill groups and classify each as substitutable, complementary, or unaffected by LLMs, then estimate causal effects via difference-in-differences around the November 30, 2022 ChatGPT launch. The study aims to clarify heterogeneous impacts across skills, project durations, and worker experience levels, informing debates on AI’s displacement vs. augmentation effects.
Literature Review
The authors situate their study within research on automation risk and AI’s labor impacts. Prior work using O*NET tasks and patents estimated varying automation risks (Frey & Osborne, 2017; Arntz et al., 2016), with later arguments that AI would redefine tasks within occupations (Brynjolfsson & Mitchell, 2017) and that high-skill tasks may be more amenable to AI (Webb, 2019). Empirical studies link AI exposure to employment growth in Europe (Albanesi et al., 2023) and show changing demand dynamics from technological innovations (Autor et al., 2024). In online platforms, AI/ML skills earn a premium (Stephany & Teutloff, 2024), and designers adapt to image-generation AI by moving to complex tasks (Lysyakov & Viswanathan, 2023). For LLMs specifically, exposure is widespread (Eloundou et al., 2023); experiments show productivity gains, especially for less experienced workers (Noy & Zhang, 2023; Brynjolfsson et al., 2023; Peng et al., 2023). Observational platform studies document demand reductions in automation-prone jobs (Demirci et al., 2023) and heterogeneous supply-side effects (Hui et al., 2023; Liu et al., 2023; Qiao et al., 2023). The authors argue that existing studies often use coarse skill groupings and emphasize substitution over complementarity, motivating their fine-grained, exposure-labeled analysis.
Methodology
Data: The study uses several million job postings from a major global online freelancing platform (anonymous), tracked via the Online Labour Index API, covering daily new postings from January 2021 to September 2023. Each posting includes platform-defined skill tags (median six skills), and metadata on expected project duration (under 3 weeks, 3–9, 9–18, 18–52 weeks) and desired worker experience level (novice, intermediate, veteran). Employer-declared budgets are available for a subset. The platform serves global clients and workers.
Clustering skill demand (BERTopic pipeline):
- Text construction: For each job, skill tags are concatenated as short text (e.g., “Expert in ghostwriting, writing, ebook, creative writing, english”).
- Embeddings: Sentence-transformer all-MiniLM-L6-v2 (384-dim) creates job-level embeddings.
- Dimensionality reduction: UMAP reduces dimensionality to address high-dimensional distance concentration and noise.
- Clustering: HDBSCAN (min cluster size ≈1000) identifies semantically coherent clusters; outlier reduction is applied. Robustness checks include multiple runs and K-means alternatives. Initial result: 286 clusters.
- Labeling clusters: Vectorization plus CTF-IDF selects representative terms to name clusters (naming aids interpretation; does not affect treatment assignment).
AI exposure labeling (substitutable, complementary, unaffected):
- Definitions follow Eloundou et al. (2023) and Qiao et al. (2023), constrained to LLM capabilities as of Nov 2022.
- Manual labeling: Two authors independently label all clusters, forming a 55-cluster high-confidence set (17 substitution, 14 complementarity, 24 unaffected).
- Prompt engineering with GPT-40: Few-shot and chain-of-thought prompts are tuned using the high-confidence set; the best prompt achieves 93% accuracy. Applying it to all clusters initially yields 31 substitution, 164 complementarity, 91 unaffected.
- Consistency checks: Manual review of GPT rationales, hierarchical cluster coherence checks, and merging of small or renamed clusters yields a final stable set of 116 clusters: 12 substitutable, 59 complementary, 45 unaffected.
Empirical strategy (Difference-in-Differences):
- Outcome: Weekly log count of new job postings per cluster.
- Treatment timing: After dummy equals 1 from the week of Nov 30, 2022 (ChatGPT public release).
- Groups: Treated groups are substitutable and complementary clusters; unaffected clusters serve as control.
- Main specification: log(Postings_it) regressed on group dummies, After, and interaction terms (Complementary×After, Substitutable×After). Variants include controls for within-cell shares of project durations and experience levels, and week and cluster fixed effects. Standard errors clustered at the cluster level.
- Event-study pre-trend checks confirm parallel trends before treatment for both treated groups relative to control (except one anomalous pre-period week tested via robustness).
Heterogeneity and robustness:
- Subsamples by expected project duration and desired experience level (novice/intermediate/veteran).
- Cluster-level treatment effect estimation using a model with Treated×After allowing cluster-specific effects.
- Robustness: Placebo treatment time (May 30, 2022), exclusion of graphic-design-related control clusters (to account for image-generating AI), randomization inference due to relatively few treated clusters, and analyses of employer budgets and applicants/job to distinguish demand vs. supply shifts.
Interpretation aids:
- Employer budgets (log mean) as willingness-to-pay proxy.
- Applicants per posting to assess whether increased competition stems from reduced postings (demand) versus increased supply.
Key Findings
- Overall DiD effects: Substitutable clusters experienced about a 24–25% decline in job postings relative to unaffected clusters after ChatGPT’s launch (interaction ≈ −0.28 log points). Complementary clusters show no significant aggregate change relative to unaffected.
- Absolute changes: Unaffected clusters’ postings rose by roughly 17% after ChatGPT, while substitutable clusters fell by about 7% in absolute terms (−24% relative to unaffected +17% increase in unaffected).
- Project duration: The decline in substitutable demand is concentrated in short-term projects (≤3 weeks). No significant differences for longer durations.
- Worker experience: Within complementary clusters, demand for novice freelancers declined significantly post-ChatGPT; effects for substitutable clusters are broadly similar across experience levels.
- Cluster-level heterogeneity:
• Largest declines among writing/translation: “About us” page writing (−59%), real estate content writing (−52%), Western European languages translation (−23%), content writing for blogs (−20%).
• Strong increases in complementary tech clusters: AI-powered chatbots (+179%, nearly tripled), and machine learning programming (about +24%).
- Demand vs. supply: Employer-reported budgets showed no significant change post-ChatGPT across groups, while applicants per posting increased (≈+0.20 log points) in treated groups, consistent with fewer postings rather than increased applicant supply—supporting a demand-side contraction in substitutable work.
- Descriptive context: Popular complementary skills (e.g., web development, JavaScript, HTML/CSS, PHP) saw growth; unaffected visual/design skills (e.g., graphic design, Photoshop) remained stable given Nov 2022 LLM capabilities.
Discussion
The findings indicate that ChatGPT’s release shifted freelance demand away from tasks readily substitutable by LLMs—especially short-term writing and translation gigs—while leaving overall complementary demand unchanged on average but heterogeneous across clusters. This addresses the core question of whether LLMs act as substitutes or complements in real markets: they substitute for routine text-based tasks, contracting demand there, and complement specialized technical work, where some clusters—such as AI chatbot development and machine learning—expand markedly. The observed decline in novice demand within complementary clusters suggests firms may internalize AI-enabled productivity gains, reducing reliance on entry-level freelancers. Stable budgets alongside higher applicants per posting reinforce that the observed shifts reflect reduced demand for certain freelance tasks rather than supply expansions or wage cuts alone. Together, results underscore the importance of task-level granularity and context in evaluating AI’s labor impacts, revealing both displacement and new demand creation consistent with creative destruction.
Conclusion
The study contributes a fine-grained, exposure-labeled analysis of online freelance labor demand around ChatGPT’s launch. Using BERTopic clustering and LLM-assisted labeling, it distinguishes substitutable, complementary, and unaffected skill clusters and estimates causal effects via difference-in-differences. Key contributions are documented declines in substitutable tasks (notably short-term writing and translation) and heterogeneous complementary outcomes, including sizable growth in AI-related development. Policy and managerial implications include prioritizing reskilling and enabling complementary AI use, particularly toward specialized technical capabilities. Future research directions include: examining firm adoption timelines and organizational changes; linking platform dynamics to traditional labor markets; measuring long-run general equilibrium effects and wage outcomes; modeling task fragmentation to understand shifts between in-house and outsourced work; and exploring whether reduced-demand tasks are performed more efficiently, transformed, or discontinued.
Limitations
- Data and platform dynamics: Occasional data collection gaps and evolving platform taxonomies (skill renaming) introduce noise; mitigated via hierarchical cluster merging and robustness checks.
- Labeling exposure: Reliance on GPT-40 and prompt engineering for exposure classification, although validated with a high-confidence human-labeled set (93% accuracy) and manual reasoning checks.
- Identification: Difference-in-differences assumes parallel trends and that unaffected clusters form a valid control; potential unobserved platform policy/marketing changes after ChatGPT could bias estimates, though pre-trend checks and web traffic stability reduce concern.
- Scope: Short-run, partial equilibrium analysis of postings (demand proxy) without comprehensive wage data; applicant identities/quality not observed.
- External AI shocks: Control group may include categories affected by non-LLM generative AI (e.g., image generation); robustness tests excluding graphic-design-related clusters show similar results but residual confounding cannot be fully ruled out.
Related Publications
Explore these studies to deepen your understanding of the subject.