Education
The pregnancy drop: How teaching evaluations penalize pregnant faculty
R. M. Olabisi
Prior work documents substantial negative impacts of motherhood on women’s careers in STEM, often termed the motherhood penalty, maternal wall, baby penalty/tax, and leaky pipeline. Mothers are less likely to be hired or promoted, receive lower pay, and are judged more harshly on competence and commitment compared to women without children and to men. Pregnancy introduces additional forms of discrimination, both hostile and ostensibly benign, with disproportionate effects on women of colour. Intersectionality theory highlights that overlapping identities (e.g., race, gender, pregnancy) produce unique and compounded forms of bias, suggesting pregnant women—particularly women of colour—may face distinctive barriers. Student evaluations of teaching (SETs), widely used and heavily weighted in academic personnel decisions, are known to reflect student biases rather than teaching effectiveness, with documented disadvantages for women, people of colour, and instructors in quantitative fields. Yet little research has isolated pregnancy-specific bias in SETs, and none has focused on engineering. This study asks whether faculty receive lower teaching evaluations when pregnant compared to when not pregnant, how such effects vary by discipline and instructor demographics (especially race/ethnicity), and whether student demographics differentially shape evaluations when students believe an instructor is pregnant. To address these questions, the study combines analysis of women faculty’s self-reported evaluations while pregnant vs not pregnant with a student video experiment manipulating perceived pregnancy status of an instructor.
The literature identifies multiple, intersecting biases affecting women in academia. Studies demonstrate a motherhood penalty in hiring, pay, and promotion, with mothers held to higher performance standards and perceived as less competent/committed, while fatherhood often benefits men’s careers. Pregnancy elicits both hostile and benevolent discrimination (e.g., demotions framed as care), with women of colour disproportionately filing pregnancy discrimination claims. Intersectionality scholarship argues that experiences of women of colour cannot be inferred from additive models of gender and race alone. SETs have been critiqued for capturing student bias more than teaching effectiveness, disadvantaging women, women of colour, and instructors in quantitative/STEM fields. Prior SET research typically uses between-group comparisons (e.g., gender, discipline) and rarely examines pregnancy specifically; reports including engineering faculty are lacking. One relevant experiment found gender misperception alone altered SETs. This study extends the literature by directly comparing the same women’s evaluations when pregnant vs not and by experimentally manipulating perceived pregnancy to isolate student-driven bias, with attention to disciplinary context and intersectional identities.
Design: Two complementary components were conducted under IRB approval (Rutgers University). (1) Faculty survey of lived experiences: An anonymous online questionnaire collected self-reported SET outcomes from women who had taught university courses both while pregnant and while not pregnant and had received student evaluations in both periods. The instrument included 32 Likert-scale items (1=strongly disagree to 5=strongly agree) covering student treatment, student characteristics, evaluation scores, instructor characteristics, and pregnancy symptoms (e.g., severity, weight change). Convenience sampling recruited participants via emails/social media to women in academia. Inclusion required SETs from both conditions; graduate students were not excluded. Of 103 respondents, 50 complete, eligible surveys were analyzed; 53 were excluded. Respondents spanned humanities (6%), medicine (12%), engineering (32%), and sciences (50%) (life/physical/earth and social sciences; education analyzed separately at 6%). Pregnancy career stage included postdocs (6%), graduate students (10%), non-tenure track (16%), assistant professors (54%), and associate professors (14%). (2) Student video experiment: In a junior biomedical engineering course, students voluntarily participated in a 5-minute evaluation-for-compensation study (“$5 for 5 minutes”), proctored by a graduate student after informed consent; the faculty instructor left the room. Participants viewed a pre-recorded 5-minute instructional video (actress instructor, African American) on a topic for which students were novices. Two otherwise-identical abbreviated 7-item SET forms were randomly distributed: one stated the instructor was pregnant ("pregnant" included), the other omitted any mention. Surveys were alternated by student sex to balance male/female recipients; other demographics (race/ethnicity) were randomly distributed. Silence was maintained during the session, and surveys were anonymous. A total of 83 complete student surveys were analyzed (demographics: 31% white, 46% Asian, 12% underrepresented minority, 8% Middle Eastern, 3% other; 52% female). Measures: Primary outcomes were instructor effectiveness and course quality ratings; for students, video quality was also rated. Predictors/covariates included discipline, instructor race (white vs women of colour), symptom severity, weight gain categories, student knowledge of pregnancy/illness, class composition (gender ratio; level/age mix). Statistical analysis: SPSS v28 used paired one-tailed t-tests for within-instructor comparisons (pregnant vs not) in faculty data; independent one-tailed t-tests for student experiment groups. Logistic regression (generalized linear model, stepwise) estimated odds of lower teaching effectiveness and course quality when pregnant, with main effects and interactions; analyses were stratified by race and symptom/weight categories. Significance threshold p<0.05.
Faculty survey (lived experiences): • Overall across all fields, instructor effectiveness ratings decreased when women taught while pregnant: 4.29 ± 0.06 (not pregnant) to 4.14 ± 0.12 (pregnant); paired t(47)=1.7, p=0.047. • Greater pregnancy impact associated with larger drops: severe symptoms 4.50 ± 0.09 to 3.90 ± 0.42; paired t(9)=1.8, p=0.051; weight gain >15 lbs 4.38 ± 0.26 to 4.11 ± 0.78; paired t(39)=2.2, p=0.018. • By field: STEM overall declined 4.34 ± 0.25 to 4.05 ± 0.72; paired t(39)=2.5, p=0.009. Humanities: no change 4.75 ± 0.25 to 4.75 ± 0.25. Medicine: 4.63 ± 0.13 to 4.58 ± 0.15; t(5)=-0.4, p=0.34. Engineering: 4.22 ± 0.29 to 3.84 ± 0.52; t(15)=1.7, p=0.055. Education: 4.25 ± 0.13 to 2.50 ± 1.50; t(2)=1.4, p=0.19. Life/physical/earth sciences: unchanged 4.45 ± 0.12 to 4.45 ± 0.12; p=0.5. • By instructor race across all fields: women of colour declined 4.33 ± 0.35 to 3.78 ± 1.27; t(17)=2.4, p=0.01. White women 4.41 ± 0.16 to 4.37 ± 0.19; t(28)=0.5, p=0.31. • Engineering (plus education) stratified by race: women of colour dropped significantly 4.167 ± 0.46 to 2.83 ± 1.36; t(5)=2.3, p=0.03. White women 4.25 ± 0.20 to 4.13 ± 0.18; t(11)=0.9, p=0.19. • Logistic regression (GLM): Women in engineering or education had higher odds of lower scores when pregnant—teaching effectiveness OR=6.140 (p=0.012; 95% CI [1.481, 25.465]); course quality OR=7.139 (p=0.008; 95% CI [1.805, 34.026]). According to the text, pregnant women of colour in engineering or education had 26.577 times greater odds of lower teaching evaluation or course quality scores than pregnant white women in other fields; if additionally experiencing severe symptoms or weight gain >15 lbs, OR=17.333. • No significant effects of instructor age, age at pregnancy, time elapsed since pregnancy, teaching experience (years), rank, class size, teaching same course before/after pregnancy, term, or institution type. • Student-related moderators in faculty data: Drops were significant when students knew the instructor was pregnant (4.38 ± 0.20 to 4.20 ± 0.59; t(41)=2.1, p=0.02). Both male and female students penalized pregnancy by ~0.25 points, but male students’ baseline nonpregnant scores were lower, making penalties more noticeable in more male-heavy classes. Equal gender ratio classes had the smallest drop (4.43 ± 0.10 to 4.30 ± 0.28; t(14)=1.2, p=0.13). First-year students imposed larger penalties (4.32 ± 0.22 to 4.12 ± 0.39; t(16)=1.9, p=0.03) than graduate students (4.42 ± 0.27 to 4.29 ± 0.61; t(11)=0.8, p=0.19). Mixed-level classes showed the largest (nonsignificant) drop (4.40 ± 0.23 to 4.10 ± 0.94; t(19)=1.4, p=0.09). Course quality drops were significant when students did not know the instructor was unwell (4.41 ± 0.22 to 4.17 ± 0.51; t(38)=2.6, p=0.007), but not when they knew (4.39 ± 0.04 to 4.17 ± 0.75; t(8)=0.9, p=0.18). • Qualitative comments: Non-engineering/education fields showed largely positive comments (sometimes biased in framing), whereas engineering/education included negative and hostile remarks (e.g., “Don’t ever be pregnant”), with some students reporting instructors to deans for perceived rudeness. Student video experiment: • Instructor effectiveness ratings increased when students believed the instructor was pregnant for most groups, but decreased for white male students (3.40 ± 0.49 to 2.80 ± 1.20; t(6)=-1.1, p=0.31), Middle Eastern students (4.25 ± 0.25 to 2.67 ± 0.33; t(4)=-3.8, p=0.009), and students with low prior interest (3.62 ± 0.59 to 3.16 ± 2.56; t(6)=-0.65, p=0.27). Increases were significant for Asian students (3.25 ± 0.62 to 3.78 ± 0.73; t(36)=2.0, p=0.02) and underrepresented minority students (3.33 ± 0.27 to 4.50 ± 0.33; t(6)=3.3, p=0.008). • Video quality ratings showed attenuated gaps for white male, Asian, and URM students, and an amplified penalty for Middle Eastern students (4.00 ± 0.67 to 2.33 ± 0.33; t(5)=-3.2, p=0.01); low-interest students’ video quality ratings also declined (3.50 ± 0.72 to 3.20 ± 0.94; t(16)=-0.6, p=0.26). Overall interpretation: Teaching while pregnant is associated with lower evaluations, especially for women of colour and in engineering/education. Student gender and ethnicity moderate perceived pregnancy effects; male students, white males, and Middle Eastern students were more likely to penalize perceived pregnancy, whereas Asian and URM students sometimes conferred a bonus.
The study provides convergent evidence—both from women’s lived evaluation histories and from an experimental manipulation—that pregnancy triggers evaluation penalties, with intersectional amplification for women of colour and disciplinary amplification in engineering and education. These findings align with broader literature showing gendered and racialized bias in SETs and extend it by isolating pregnancy as a salient cue that shapes student judgments. The engineering/education disadvantage suggests cultural and compositional factors within disciplines with fewer women may heighten maternal wall effects. Student demographics matter: male students, and particularly white male and Middle Eastern subgroups in the experiment, contributed to lower ratings when believing the instructor was pregnant, whereas Asian and URM students tended to rate higher, indicating heterogeneous value systems and norms regarding working motherhood. Severity of pregnancy impact (symptoms, weight gain) further exacerbated penalties, potentially reflecting biases toward more visibly pregnant instructors or weight-based stigma. The lack of penalties in humanities, medicine, and life/physical/earth sciences indicates field-specific climates and expectations may buffer or exacerbate pregnancy bias. Given the heavy reliance of academic personnel committees on SETs, unadjusted use of these metrics risks propagating cumulative disadvantage against pregnant faculty—particularly women of colour—across hiring, reappointment, tenure, and promotion processes.
This study is the first to directly compare the same women’s teaching evaluations when pregnant versus not pregnant across fields and to experimentally test student reactions to perceived pregnancy, revealing a pregnancy-specific evaluation penalty concentrated among women of colour and in engineering/education. The work contributes to understanding of intersectional bias and field-specific climates affecting women’s academic careers and underscores risks of relying on SETs in personnel decisions. Practical implications include: (1) tenure and promotion committees should explicitly account for pregnancy-related bias when interpreting teaching evaluations; (2) institutions could allow instructors to exclude SETs collected during pregnancy (as was done for emergency remote teaching during COVID-19); and (3) consider supplementing or replacing SETs with less biased measures (e.g., peer observations). Future research should replicate with larger, multi-institutional samples, expand beyond engineering to other male-dominated fields, assess longitudinal career impacts, disentangle pregnancy visibility versus weight stigma, and test interventions that mitigate student bias.
Potential social desirability bias in the student experiment may have led participants to give politically correct answers, possibly inflating positive ratings when pregnancy was signaled. Differences between “video quality” and “instructor effectiveness” items may have shifted students’ focus to technical aspects rather than course quality. Stronger penalties associated with greater weight gain could reflect bias against more visibly pregnant instructors or weight/anti-fat bias rather than pregnancy per se. Small subgroup sample sizes in both faculty strata (e.g., education) and student demographics limit power and precision of estimates and may yield unstable odds ratios. Convenience sampling limits generalizability. The datasets are not publicly available due to reidentification risk in small demographic cells.
Related Publications
Explore these studies to deepen your understanding of the subject.

