Linguistics and Languages

Stylistic and linguistic variations in compliments: an empirical analysis of children's gender schema development with machine learning algorithms

X. Liao and Y. Zhang

This study by Xinyu Liao and Yanhui Zhang investigates how children's gender schemas influence their language styles in compliments. Through oral discourse tasks involving 25 Mandarin-speaking children, the research reveals that girls demonstrate a heightened awareness of gendered language, particularly through their less positive imitated compliments. Discover the intriguing dynamics of gendered communication among pre-adolescents!... show more

Introduction

Compliments are common conversational practices used to convey positive sentiments and build interpersonal relationships. They function as positive politeness strategies but can carry multifaceted meanings and be face-threatening depending on context and cultural norms. Compliment use provides a lens into sociocultural values of a speech community and is influenced by factors such as gender, social distance, status, and region. Prior work on gender and compliments has largely focused on adults, often finding women pay and receive more compliments and that topics differ, though some findings challenge earlier generalizations. There is limited research on pre-adolescent children's complimenting language and its relation to gender schema development. To address these gaps, this study investigates children's development of gender schema through linguistic variations in compliments under two conditions: normal speech and imitation of opposite-sex speech. Twenty-five Mandarin-speaking children (aged 9–12) from Ningbo, China, completed oral discourse completion tasks across 12 compliment scenarios spanning appearance, possession, and ability/performance. Analyses target three linguistic levels: lexical (word use and lexical richness), discourse-pragmatic (intensifiers and affective sentence-final particles), and discourse-semantic (sentiment polarity via machine learning). Research questions: (1) To what extent do pre-adolescent boys and girls differ in the linguistic features of their compliments? (2) To what extent do pre-adolescent boys and girls differ in the sentiment polarity (positivity and negativity) of their compliments? (3) To what extent do boys' and girls' imitated compliments of the opposite sex differ from the supposedly normal compliments?

Literature Review

Gender schema theory posits that children internalize gendered expectations, forming schematic mental representations that guide behavior and cognition. Early language development often shows girls' advantages in lexical production and expressive vocabulary, and children acquire sociolinguistic variants early (e.g., pitch differences). Caregivers may use different styles with boys vs. girls, and children present gender-typical styles (e.g., more intensifiers and collaborative styles by girls). Children can also adjust styles to interlocutors and recognize gendered meanings of speech styles. Compliment research has largely examined pragmatic strategies and responses across languages and cultures, with methods including ethnographic observation, recordings, role-play, recall, and discourse completion tasks (DCTs). DCTs, while less authentic than naturalistic data, elicit metapragmatic knowledge and allow controlled manipulation of variables; thus they are suitable for probing socially appropriate expectations and beliefs, and for quantifiable analysis. Classic work showed formulaic patterns in English compliments (e.g., NP BE ADJ), with frequent adjectives like good, pretty, beautiful. Gender and compliments research often pursues a difference approach, reporting that women produce and receive more compliments and focus more on appearance; yet constructivist approaches caution against reinforcing stereotypes. Gender remains embedded in language use across levels. The present study looks beyond adult data to pre-adolescent children's complimenting, focusing on internal linguistic features (lexical, pragmatic markers) and how these reflect gender schema about self and others (including imitated opposite-sex styles).

Methodology

Participants: 25 Mandarin-speaking children (15 boys, 10 girls), ages 9–12, from a primary school in Ningbo, China; institutional consent obtained. Design and tasks: Oral discourse completion tasks (ODCTs) elicited immediate, scenario-based compliments. Twelve scenarios covered appearance, possession, and ability/performance (e.g., hairstyle, skirt, Barbie doll, basketball match, ballet show, pencil box, teddy bear, dance, football match, jacket). Visual prompts were provided. Each child produced compliments in two styles: their normal speech and an imitated style of the opposite gender. Data preparation: Elicited speech was segmented into words using SegmentAnt due to lack of spaces in Chinese writing. Segmentation was manually checked by native speakers. Four corpora were compiled by gender and style: boys' normal, girls' normal, boys imitating feminine, and girls imitating masculine. Corpus sizes (types/tokens): boys' normal (319/1396), girls' normal (310/1195), boys' imitated feminine (260/1196), girls' imitated masculine (241/832). Lexical analysis: Word frequency lists were generated with AntConc to compare functional and content word distributions. Normalized frequencies were computed per 1000 tokens to control text length. Lexical richness (LR) was measured using D (an iterated TTR measure shown effective for Mandarin). Linear mixed-effects regression (lme4 in R) modeled D with speaking style as fixed effects and speaker as random effect. Discourse-pragmatic analysis: Two pragmatic markers were examined: intensifiers (e.g., zhen 'really', kezhen 'so', hen 'very', hao 'good', ting 'very', tebie 'especially', zhenshi 'really', zheme 'so') and affective sentence-final particles (ASFPs: ya, eh, a, ne, yo, la). Relative frequencies per thousand words were computed for each child. Linear mixed-effects regressions assessed effects of gender/style with speaker as random effect. Sentiment analysis: Supervised machine learning with logistic regression was used to predict sentiment polarity (positive vs. negative) of compliments. Steps: manually label 50% of data; preprocess by removing punctuation and stop words and performing word segmentation; vectorize texts; train logistic regression classifier; apply to remaining data to obtain probabilities of positivity/negativity (accuracy reported at 95%). Probabilities were transformed into sentiment scores from -5 (strongly negative) to +5 (strongly positive) where higher scores indicate higher positivity. Mixed-effects regression modeled sentiment scores by speaking style with speaker as random effect.

Key Findings

Lexical choices: Strong formulaicity across groups. Top words and adjectives were highly similar across corpora. Common adjectives: 好看 (good-looking), 可爱 (adorable), 漂亮 (pretty/beautiful), 好 (good). This aligns with prior English compliment formulae.
Lexical richness (D): Girls' normal speech exhibited significantly higher LR than boys' normal and boys' imitated feminine styles and girls' imitated masculine style (linear mixed-effects: Girls' normal style estimate +8.99, t=2.04, p=0.04). No significant LR change between normal and imitated styles, suggesting LR is not used as a salient gender index by children in imitation tasks.
Intensifiers: Girls used significantly more intensifiers than boys in both their normal and imitated masculine speech (mixed-effects results: Girls' normal style estimate +3.83, t=2.36, p=0.02; Girls' imitated masculine style estimate +3.64, t=2.1, p=0.04). Boys showed no significant difference between their normal and imitated feminine styles (t=-0.4, p=0.69). This indicates habitual higher intensifier use by girls and limited meta-awareness among boys of intensifiers as a feminine index.
Affective sentence-final particles (ASFPs): Significant gendered style-shifting. Boys' normal style had significantly lower ASFP frequency (estimate -1.46, t=-2.2, p=0.04). Girls' imitated masculine style also showed significantly lower ASFP use (estimate -2.69, t=-2.47, p=0.02). Boys increased ASFPs when imitating girls; girls decreased ASFPs when imitating boys, suggesting children overtly associate ASFPs with femininity and adjust when performing gendered styles.
Sentiment polarity: Overall highly positive across groups (mean sentiment scores: boys' normal 4.84, SD 0.85; boys' imitated feminine 4.88, SD 0.35; girls' normal 4.65, SD 1.29; girls' imitated masculine 3.8, SD 2.75). Mixed-effects modeling showed girls' imitated masculine compliments had significantly lower sentiment scores than baseline (t=-3.86, p<0.001), with notable variability. Some girls produced overtly negative utterances when imitating boys (e.g., “Nothing special”; “Such a self-flattering girl”). Logistic regression classifier reported 95% accuracy.

Discussion

Findings indicate that pre-adolescent children exhibit gender-differentiated language styles, with discourse-pragmatic features (especially ASFPs) being more salient to gender schema than lexical features. Children appear to recognize ASFPs as feminine and style-shift accordingly in imitation, whereas intensifiers show a more static, habitual pattern—girls use more regardless of condition, and boys do not strategically increase them to index femininity. Lexical choices are largely formulaic across groups and less implicated in gender performance; girls, however, show higher lexical richness in their normal compliments, aligning with broader observations of girls' expressive vocabulary advantages. The sentiment results suggest that some girls hold stereotypes of boys as less prosocial or more pugnacious in complimenting, leading to lower positivity when imitating boys. This mismatches boys' own (elicited) positive compliments in ODCTs, potentially reflecting methodological constraints—boys may produce normative compliments in elicited tasks that differ from natural interaction. Overall, results support extensions of Gender Schema Theory to linguistic style: children develop mental representations of gendered others' speech and deploy salient pragmatic features when performing gendered identities. The study also raises the possibility of age effects and delexicalization differences influencing intensifier vs. ASFP usage, and underscores the need to consider broader sociocontextual factors in intensifier variation.

Conclusion

The study extends compliment research by integrating ODCTs with role-playing imitation to probe children's gender schema through multi-level linguistic analysis. Gender variations emerged across levels, with the strongest social salience at the discourse-pragmatic level (ASFPs) where clear style-shifting occurred in imitation. Lexical richness was higher in girls' normal speech but did not function as a gender index in imitation. Sentiment analysis showed generally positive compliments, but girls lowered positivity when imitating boys, reflecting stereotypes of masculine speech as less supportive. These findings demonstrate that children assign different salience weights to linguistic features in performing and perceiving gender. The work contributes to understanding how gendered language practices and socialization manifest even in routine speech acts like compliments and demonstrates the utility of combining controlled elicitation with machine learning for discourse-semantic analysis.

Limitations

Methodological elicitation: ODCTs and imitation tasks may not capture authentic compliment behavior in natural interactions; children may produce socially appropriate rather than typical everyday speech.
Sample size and scope: 25 participants and 585 compliments constitute a relatively small corpus, limiting generalizability and fine-grained subgroup analyses.
Age range: Restricted to ages 9–12; developmental trajectories across broader pre-adolescence not examined.
Potential confounds: Unmeasured socioeconomic, institutional, and contextual factors may influence intensifier/ASFP usage; degree of delexicalization hypothesized but not empirically isolated.
Cultural and topic controls: While scenarios were controlled, cultural expectations about topics (e.g., appearance) could bias results; cross-cultural generalizability is unknown. Future work should incorporate naturalistic conversational data (e.g., ethnographic recordings), expand participant demographics and age ranges, and model additional sociolinguistic factors.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Prediction of mortality risk and duration of hospitalization of COVID-19 patients with chronic comorbidities based on machine learning algorithms

P. Amiri, M. Montazeri, et al.

Business

Connecting with fans in the digital age: an exploratory and comparative analysis of social media management in top football clubs

E. Romero-jara, F. Solanellas, et al.

Sociology

Social capital, human capital and ethnic occupational niches: an analysis of ethnic and gender inequalities in the Spanish labour market

M. Bolíbar

Political Science

The relationship between leadership style and staff work engagement: An empirical analysis of the public sector in Vietnam

N. H. Thanh, N. V. Quang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny