logo
ResearchBunny Logo
Introduction
Occupational gender stereotypes, the belief that certain jobs are more suitable for men or women, have long-standing implications for career choices, opportunities, and economic outcomes. Research on gender stereotypes in the accounting profession has yielded mixed results; some studies suggest a male-dominated field, while others point to female majority. This study introduces a novel dimension by investigating whether AI, specifically Large Language Models (LLMs), perpetuate these stereotypes. The research question is: "Does AI perpetuate stereotypes within the accounting profession, and if so, in what manner?" Job titles were chosen as the focus because they encapsulate the identity and expectations of a profession, often serving as the first point of reference in career evaluations and carrying cultural and historical connotations. The study focuses on LLMs because they are the foundational layer of AI language processing, trained on extensive datasets that often reflect societal biases. By analyzing LLMs' gender classification of accounting job titles, the study aims to illuminate implicit associations between roles and gender, offering valuable insights into AI technology and societal perceptions. The study contributes new knowledge to gender studies, AI ethics, and occupational psychology, ultimately aiming to foster more inclusive and unbiased environments.
Literature Review
Prior research indicates that LLMs are susceptible to perpetuating gender bias and stereotypes. Studies have shown LLMs exhibiting biased assumptions about gender roles, diverging from statistical data; openly discriminating based on gender in rankings; and displaying implicit biases in generated narratives, often portraying female characters based on physical attributes and male characters based on intellectual qualities. Comparative studies of LLMs have highlighted significant gender biases in generated content, with variations between models. Some models show implicit bias (associating men and women with different professional titles), while others display explicit bias (overly promoting women's prioritization of marriage over career). Research has explored the underlying causes, including biased training data, algorithms, and fine-tuning processes. However, existing studies primarily focus on general gender contexts and may not directly apply to the specific context of the accounting profession, with its specialized terminology and evolving job titles. This study addresses this gap by specifically investigating LLM-perpetuated gender stereotypes within accounting.
Methodology
This study employs a "Toy Choice" experimental approach, adapted from research on children's behavior and toy preferences. Three widely downloaded LLMs from Hugging Face (referred to as Model 1, Model 2, and Model 3 to maintain anonymity) were selected for their zero-shot classification capabilities. Fifty-three job titles from the Association of Chartered Certified Accountants (ACCA) website were used as inputs. Each LLM was tasked with classifying each job title into four gender categories: female, male, other (non-binary or gender non-conforming), and unknown. The experiment was replicated on different days using different computers (notebook and desktop) but the same internet platform (Google Colab) to ensure consistency. The order of labels was reordered in the second experiment to test for ordering effects. The results, showing the number of job titles classified into each category for each model, were organized into a contingency table. A Chi-squared test for independence was then used to assess significant differences among the three models' classification results (removing the 'unknown' category due to zero counts across all models). To further investigate salary disparities linked to gender classification, salary ranges for consistently classified job titles (those categorized the same by all three LLMs) were analyzed using an independent samples t-test.
Key Findings
The study's key findings reveal significant variations in gender classification among the three LLMs. Model 1 showed a strong preference for male labels, Model 2 for female labels, and Model 3 a more balanced distribution, with a notable number assigned to the "other" category. The Chi-squared test revealed significant differences (p<0.01) in the distributions of gender classifications across the three models, with Cramer's V indicating a moderate to strong association between the models and assigned categories. Ten out of the 53 job titles were consistently classified by all three models: six into the female category (primarily entry-to-mid-level, operational, and specialized roles like Financial Analyst, Internal Audit Manager, Assistant Accountant) and four into the male category (higher-level, strategic, and leadership roles like Chief Financial Officer, Head of Finance, Senior Internal Auditor). Analysis of salary ranges for these consistently classified job titles showed a significant difference (p value ~1.45 x 10-79) between the two groups, with the average salary for the "male" group being 1.74 times higher than the "female" group. This disparity in salary aligns with the seniority differences observed between the two groups. The variation in LLM classifications reflects the influence of training data, which may perpetuate historical gender imbalances associated with specific job roles. Country-specific differences in gender representation in the workforce also play a significant role.
Discussion
The findings demonstrate that LLMs exhibit biases in assigning gender to accounting job titles, reflecting biases in their training data and mirroring societal and cultural stereotypes within the accounting profession. These biases extend beyond the accounting sector and have broader implications for AI applications, particularly in hiring and recruitment. While some LLM-based AI systems attempt to reduce bias, ingrained stereotypes remain challenging to eliminate completely. Conversations with ChatGPT illustrate how stereotypes can influence LLM responses, even with attempts at neutrality. The perpetuation of gender stereotypes in LLMs can lead to systemic discrimination and amplify existing societal inequalities. Data augmentation or synthetic data generation using AI-generated content (AIGC) to train AI models can exacerbate the bandwagon effect, reinforcing existing stereotypes in the training data and leading to a cascading effect across AI systems.
Conclusion
This research reveals that LLMs exhibit biases in gender classification of accounting job titles, reflecting biases in their training data and leading to salary discrepancies. This reaffirms the presence of gender stereotypes in AI and highlights their specific manifestation within the accounting context. The study's contributions include the empirical investigation of LLM gender labeling, statistical validation of observed patterns, and the implications for economic outcomes. Future research should explore model architectures and data augmentation techniques to further investigate how LLMs perpetuate gender stereotypes and develop more ethical and unbiased AI applications.
Limitations
The study's focus on three specific LLMs limits the generalizability of the findings to other models. The use of job titles from a single professional body (ACCA) might not fully represent the diverse range of accounting roles globally. Reliance on publicly available salary data may not capture the full complexity of compensation structures within the accounting industry.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny