Education
Math items about real-world content lower test-scores of students from families with low socioeconomic status
M. Muskens, W. E. Frankenhuis, et al.
The study investigates whether the real-world content embedded in math word problems (e.g., money, food, social interactions) systematically biases performance for students from low socioeconomic status (SES) backgrounds compared to their peers with similar underlying math ability. SES is operationalized via the number of books at home, a commonly used proxy in cross-national education research. Prior work shows persistent SES gaps in math achievement and highlights how non-mathematical features (e.g., language complexity) can differentially hinder low-SES students. Drawing on the hidden talents approach, the authors initially hypothesized that ecologically relevant content—more closely aligned with challenges common in low-SES contexts—would improve low-SES students’ performance relative to their own average across items. They contrast this expectation with alternative perspectives predicting decrements: (a) scarcity/attention capture, where salient valued resources (money, food) and socially salient content may narrow attention and reduce cognitive control; (b) difficulties transferring from informal, concrete contexts to formal mathematical algorithms, especially when examples are highly salient; and (c) stereotype threat, whereby cues linked to stigmatized identities may depress performance. The central research question is whether items about money, food, and social interaction differentially affect low-SES students’ item-by-item test performance in large-scale international math assessments.
The paper reviews evidence that SES strongly predicts academic performance and that test items can include irrelevant demands that introduce bias. It summarizes research on language complexity in math tests disproportionately affecting low-SES students. From anthropology and cultural psychology, it notes that individuals in low-SES contexts can demonstrate sophisticated problem-solving in ecologically valid, real-world settings but may underperform in formal testing contexts. The hidden talents approach posits that adversity can shape certain cognitive abilities such that ecologically relevant materials sometimes improve performance among adversity-exposed youth; several studies show performance gains on executive function tasks when using ecologically relevant stimuli. However, the literature is mixed, and alternative accounts warn that salient real-world content may capture attention under scarcity, impede transfer from informal to formal knowledge, or trigger stereotype threat. The review motivates testing whether specific content domains (money, food, social interaction) used frequently in standardized math tests differentially affect low-SES performance.
Design and data: Secondary analyses of TIMSS released items from 2007 and 2011 across 58 countries, including students in grades 4 (average age ~9.5) and 8 (average age ~13.5). Total student-level dataset N = 5,501,165. Item-level analyses included 161 math items (across grades 4 and 8, 2007 and 2011).
SES measure: Number of books at home (five categories: 0–10; 11–25; 26–100; 101–200; >200), coded 1–5 (low–high). Planned parental education proxy was dropped due to high and selective missingness (especially in the lowest SES group).
Item content classification: Items were coded as low-SES ecologically relevant if they involved (1) money, (2) food, or (3) social interaction (e.g., competition, working together). All remaining items were considered neutral (other word problems or purely mathematical notation). Two independent coders achieved 82% agreement; disagreements were resolved to full agreement.
Linguistic and structural features: Items were coded for general linguistic complexity (total word count, number of different words, total characters, characters without spaces, average syllables per word, sentence count, average sentence length), academic word use (Academic Word List), and mathematical language (quantitative and spatial language). Additional item features included whether it was a word problem vs mathematical notation, item type (multiple choice vs open response), context domain (e.g., number, geometry, data), and cognitive domain (Knowing, Applying, Reasoning). Country fixed effects were included.
Preregistration and pilots: A preregistered plan (OSF: https://osf.io/9eqkp/) specified data source, definitions, and DIF analyses. Initial preregistered pilot analyses on smaller subsets (including 1999 and 2003 TIMSS replications) unexpectedly indicated worse performance for low-SES students on ecologically relevant content; these pilots are reported in the Supplementary Information. Additional main analyses (mixed logistic regressions and linear models) beyond preregistration were conducted.
Student-level analyses: Mixed logistic regression with item response (1=correct, 0=incorrect) as the dependent variable. Predictors: SES (1–5, dummy-coded), content relevance (1=low-SES relevant, 0=neutral) as a within-subjects factor, and the SES × relevance interaction. Between-subjects covariates included SES and each student’s average test score. Item-level covariates included: word problem indicator, item type, context domain (dummies), cognitive domain (Knowing, Applying, Reasoning), linguistic features (as above), quantitative language, spatial language, and country dummies. Analyses were run separately for grade 4 and grade 8. Additional models replaced the composite relevance dummy with separate indicators for money, food, and social interaction.
Item-level DIF analyses: Differential Item Functioning was assessed using Mantel–Haenszel (MH) and logistic regression (LR) methods. For MH, SES was dichotomized (lowest SES=1 vs highest SES=5) due to MH requirements. For each item, MH odds (low vs high SES), and significance (uniform DIF) were computed after matching on overall test score (percentage of released items correct). To examine whether ecologically relevant content is associated with DIF disadvantage for low-SES students, linear regression models predicted MH odds from content relevance (and from specific content categories: money, food, social interaction) controlling for item features that differed between groups (bolded variables in Table 3), including linguistic complexity, item type, context and cognitive domains, and language features.
Student-level results (N≈2,779,383 for grade 4; N≈2,721,782 for grade 8):
- Significant SES × content relevance interactions in both grades. Compared to high-SES students with the same average math performance, students from the lowest SES group had:
- Grade 4: 18% lower odds of a correct response on low-SES relevant items (Exp(B)=0.82; 95% CI [0.80, 0.85]; p<0.001).
- Grade 8: 16% lower odds (Exp(B)=0.84; 95% CI [0.84, 0.88]; p<0.001).
- By content type: The disadvantage was driven by items about money and social interaction in both grades; food-related content showed a disadvantage in grade 4 but not in grade 8.
Item-level DIF (161 items):
- Items with low-SES ecologically relevant content showed significantly more DIF to the disadvantage of low-SES students than neutral items.
- Mean MH odds for low-SES students (vs high-SES; matched on overall score):
- Low-SES relevant content: 0.91 (lower than predicted by ability; disadvantage).
- Neutral word problems: 1.02.
- Mathematical notation only: 1.06.
- After controlling for item features in linear regression, low-SES relevant content predicted lower MH odds (b = −0.09, t(160) = −2.55, p = 0.012), with a medium effect size (Cohen’s d ≈ 0.70).
- By specific content: Money and social interaction were significantly associated with lower odds for low-SES students; food content showed a significant disadvantage compared to mathematical notation but not compared to other neutral word problems.
Additional notes:
- For the highest SES group (reference), low-SES relevant items tended to be easier on average (estimates >1), highlighting the differential nature of the bias.
- The SES-gradient in interaction coefficients was monotonic, with progressively larger disadvantages for lower SES categories.
Contrary to the hidden talents expectation that ecologically relevant content would support low-SES students, the study finds that items involving money, food, and social interactions reduce their likelihood of success relative to neutral items, even after adjusting for linguistic complexity and item/cognitive domains. The results align more closely with accounts of attention capture under scarcity (where salient resource-related or socially salient cues narrow attention and tax cognitive control), difficulties in transferring from informal, concrete contexts to formal mathematical procedures (especially when examples are highly salient), and potentially stereotype threat effects. The authors suggest that ecologically relevant content may differentially affect distinct phases of mathematical problem solving: it could aid initial understanding but hinder calculation/execution due to salience-driven distraction or reliance on informal associations. These findings imply that ostensibly real-life contexts can introduce unintended bias in standardized math assessments, disadvantaging low-SES students beyond their measured ability and thus undermining fairness. The work motivates targeted interventions at the teacher and test-design levels to mitigate bias while acknowledging that excluding such content entirely is neither feasible nor desirable given curricular goals.
This study demonstrates that ecologically relevant math item content—specifically money, food, and social interaction—introduces systematic bias against low-SES students in large-scale international assessments. Despite initial hypotheses grounded in the hidden talents framework, the evidence indicates significant performance decrements for low-SES students on such items relative to neutral content, robust to controls for linguistic and item characteristics. The findings highlight the need to identify and reduce content-driven biases in standardized testing, inform the design of fairer assessments, and stimulate debate on evaluating students by growth versus absolute scores. Because money- and socially themed content is integral to curricula and life skills, the authors propose interventions (e.g., additional test-taking guidance, careful framing or instructions, minimizing triggering content) rather than outright removal. Future research should experimentally test mechanisms (attention capture, transfer difficulties, stereotype threat), identify additional biased versus neutral content, and assess generalization across subjects and countries.
- Observational design: Potential unmeasured differences between items with and without ecologically relevant content may partially explain effects despite extensive covariate controls.
- SES measurement: Reliance on number of books at home; parental education was unavailable due to high and selective missingness. The books measure captures cultural capital and only moderately correlates with income/occupation.
- Matching criterion for DIF used overall test score including ecologically relevant items, potentially underestimating true ability for low-SES students and thus underestimating bias.
- Item classification, while double-coded with 82% agreement, may still involve residual subjectivity.
- Possible construct differences: Money-related items may tap somewhat distinct skill sets compared to other items.
- Cross-national heterogeneity was not analyzed; effects may vary by country policies/practices.
Related Publications
Explore these studies to deepen your understanding of the subject.

