Agriculture
Context-dependent agricultural intensification pathways to increase rice production in India
H. S. Nayak, A. J. Mcdonald, et al.
India’s rice sector, transformed by Green Revolution technologies, still faces rising aggregate demand and evolving sustainability challenges. Sustainable intensification—raising yields without compromising environmental and economic outcomes—is crucial given rice’s importance for water resources, greenhouse gas emissions, and smallholder livelihoods. Conventional recommendations, often extrapolated from controlled experimental stations, inadequately reflect heterogeneous soils, hydrology, and management practices across smallholder landscapes. Complementary approaches include farmer research networks and observational yield gap analyses, but prior studies often rely on small surveys, crop models, remote sensing, or expert judgment and rarely capture field-level heterogeneity. Advances in interpretable machine learning enable identification of field-specific constraints and ex-ante scenario analysis. The study’s objectives are: (i) quantify the nature and contributors to attainable rice yield gaps across seven Indian states, and (ii) assess how analytics-based solutions can support sustainable intensification in a case study of Bihar and adjacent districts of Eastern Uttar Pradesh (Eastern India). The approach aggregates a large landscape crop assessment survey (LCAS) and applies random forests, SHAP interpretation, and scenario analysis to identify context-dependent pathways to narrow yield gaps while addressing multiple sustainability goals.
The paper situates its contribution within several strands of prior work: (1) yield gap analysis for sustainable intensification of cereal systems, which has often relied on limited sample surveys, crop models, remote sensing, or expert judgment; (2) farmer research networks as an alternative to station-based trials to capture context-dependent performance of innovations; (3) recent large-n, farmer field-based analyses (e.g., prior work by Nayak et al. focusing on population-level yield gaps in Northwest India without field-level heterogeneity); and (4) interpretable machine learning methods (e.g., SHAP) enabling local (field-specific) attribution of yield constraints and ex-ante scenario evaluation. The literature indicates persistent heterogeneity in smallholder systems and the need for methodologies that capture interactions among management factors (fertility, irrigation, timing) and biophysical conditions to guide targeted recommendations.
Ethics: The study was reviewed by CIMMYT’s Research Ethics Committee (IREC.2019.06); verbal consent obtained from participants.
Data and study area: Landscape-scale crop assessment surveys (LCAS) covered seven rice-producing Indian states during 2017–2019 monsoon seasons: Eastern Uttar Pradesh and Bihar (n=10,714 field-year combinations), Odisha (n=747), Jharkhand (n=717), Chhattisgarh (n=1,099), West Bengal (n=1,363), Andhra Pradesh (n=1,046). Data were collected via digital tools on agronomic practices and biophysical attributes for each farm’s largest rice field. Farmer-reported yields were validated in a subsample via 2×2 m crop cuts. Weather variables (sowing to harvest) were appended from NASA POWER gridded daily data.
Attainable yield gap definition: For each state, the attainable yield was defined as the mean yield of the top 10% of fields. The attainable yield gap (Yga) for other fields equals the difference between this attainable yield and observed yields.
State-level modeling and ICE analysis: State-specific random forest (RF) models (trained and tuned per Nayak et al.) identified key yield determinants. Overfitting was limited by requiring ≥50 observations per terminal node. Permutation-based feature importance ranked variables. Individual conditional expectation (ICE) analyses were performed for the top two management constraints. For the most important variable (e.g., N rate), yield was predicted across observed ranges (10 kg N ha−1 steps) for each field; the difference between predicted yield at reported rate and the maximum predicted yield across the vector yielded Yg1. The corresponding optimal value (e.g., Nyield max) was then fixed and ICE performed for the second variable to obtain Yg2. Yg1+Yg2 indicated potential gap closure due to addressing top management constraints.
Eastern India case study: A pooled RF model was fit for Bihar and Eastern Uttar Pradesh. SHAP (via iml in R) provided local (field-level) attribution of management and biophysical factors to predicted yields. Numeric variables were min-max scaled; categorical variables ordinally factored and scaled. Hotspot analysis (Getis-Ord Gi*, 10 km fixed distance) mapped spatial clusters of consistently negative vs. positive SHAP values for top constraints (N, irrigation). Fields were clustered into four groups based on SHAP signs for irrigation (I) and N: I′N′ (neither limiting), IN′ (irrigation limiting), IN (both limiting), I′N (N limiting).
Scenario analysis (Eastern India): Four scenarios evaluated production, input use, and profitability impacts relative to current practice:
- Scenario 1: Blanket 125 kg N ha−1 (state recommendation) on all fields.
- Scenario 2: Blanket 180 kg N ha−1 (analytics-derived non-limiting N rate from partial dependence) on all fields.
- Scenario 3: Targeted N=180 kg ha−1 only on fields with negative SHAP for N (I′N and IN clusters; 47% of fields).
- Scenario 4: Targeted co-limitation resolution on fields with negative SHAP for both N and irrigation (IN cluster; 20% of fields): set N=180 kg ha−1 and number of irrigations=5. For Scenarios 1–2, additional production was computed by multiplying predicted yield gains by total rice area per district. For Scenarios 3–4, district-level aggregation considered the share of fields per cluster; additional N use, added irrigations (costed at US$20 per irrigation), and subsidized N cost (US$0.14 kg−1) were used to compute partial net returns based on 2018 minimum support price for rice.
Software: Analyses conducted in R 4.2.3 using dplyr, caret, ranger, iml, geodata, terra, tidyverse, ggpubr, data.table, gridExtra; hotspot analysis in ArcGIS Pro 2.9.0.
- State yields and gaps: Average rice yields ranged from 3.3 t ha−1 (Jharkhand) to 5.5 t ha−1 (Andhra Pradesh). Attainable yields ranged from 5.1 t ha−1 (Jharkhand) to 7.7 t ha−1 (Andhra Pradesh). Median attainable yield gaps were 1.7 t ha−1 (West Bengal) to 2.4 t ha−1 (Chhattisgarh); Bihar & Eastern UP, Jharkhand, and Odisha had median gaps ~1.9 t ha−1, indicating substantial scope for intensification with existing practices.
- Model performance: State RF models explained 29% (Odisha) to 52% (Andhra Pradesh) of yield variation.
- Principal constraints by state (ICE analysis): • Bihar & Eastern UP and Odisha: N fertilizer rate and number of irrigations were top constraints; average anticipated gains ~0.5 and 0.2 t ha−1 respectively, with much higher gains in the most responsive quartile (0.8–2.0 t ha−1 in Bihar & Eastern UP; ~0.6 t ha−1 in Odisha). • West Bengal: K fertilizer emerged as most important. • Jharkhand: Variety and N fertilizer rate. • Chhattisgarh: Insufficient N and P; average gain 0.36 t ha−1; top quartile gains 0.5–2.0 t ha−1. • Andhra Pradesh: Biophysical factors dominated; limited combined gain (N and sowing time) of ~0.27 t ha−1.
- Eastern India SHAP analysis: Management practices with largest influence: number of irrigations, total N, total P, and Zn rates. Biophysical drivers: cumulative solar radiation and maximum temperature (higher values associated with higher predicted yields). Later sowing dates had negative SHAP values (yield-reducing).
- Spatial patterns (hotspots): N limitation opportunities concentrated in southeast and north-central Eastern India; irrigation limitation pronounced in the southeast and northern half; southern/southwestern areas less water-limited.
- Co-limitation clusters (10,714 field-years): 35% I′N′ (neither limiting), 35% IN′ (irrigation limiting only), 20% IN (both limiting), 10% I′N (N limiting only).
- Scenario outcomes (Eastern India): • Scenario 1 (125 kg N ha−1 blanket): +0.15 million tons rice; −0.016 million tons total N vs. current practice; heterogeneous field impacts. • Scenario 2 (180 kg N ha−1 blanket): +0.58 million tons rice; +0.22 million tons N use; average district yield gain 0.15 t ha−1 (~3.5%); profit +US$26 ha−1. • Scenario 3 (target N to 47% fields with negative N SHAP): +0.41 million tons rice; +0.13 million tons N; 21% higher NUE vs. Scenario 2; average yield gain 0.31 t ha−1; profit +US$60 ha−1. • Scenario 4 (target N+irrigation to 20% co-limited fields): +0.56 million tons rice; +0.08 million tons N; +2.33 million irrigation events/season (~17% of water safely available for future use on average, district-varying); average yield gain 0.68 t ha−1 (~19%); profit +US$90 ha−1. Overall, targeted, analytics-led interventions substantially improved yield, profitability, and N use efficiency compared to blanket recommendations.
The study demonstrates that attainable yield gaps of ~1.7–2.4 t ha−1 persist across major Indian rice states, indicating considerable potential for intensification using practices already present in farmers’ fields. By combining large-n observational data with interpretable machine learning, field-specific constraints can be identified, revealing that the main contributors to yield gaps differ by region and often involve N management and irrigation, alongside K, P, Zn, variety choice, and sowing dates. In Eastern India, SHAP-based local attribution and spatial hotspot analyses enabled delineation of where interventions would be most impactful. Ex-ante scenario evaluation showed that analytics-informed blanket recommendations (Scenario 2) outperform legacy blanket advice (Scenario 1), but targeting fields based on predicted responsiveness (Scenarios 3 and 4) yields larger per-field gains in yield and profitability with better input use efficiency. Such targeting addresses key barriers to adoption by offering more tangible, lower-risk benefits to farmers, allows better allocation of scarce extension and innovation resources, and improves environmental performance through higher N use efficiency with potential greenhouse gas co-benefits. Thus, analytics-led targeting directly addresses the research objectives by identifying context-dependent pathways for sustainable intensification and demonstrating their advantages over one-size-fits-all strategies.
This work develops and applies an analytics-based framework—combining large-scale farmer surveys, random forests, SHAP interpretation, spatial hotspot analysis, and ex-ante scenario modeling—to quantify attainable rice yield gaps across Indian states and identify where and how to narrow them sustainably. Key contributions include: (i) state-level diagnoses of dominant constraints; (ii) field-level attribution of yield drivers in Eastern India; and (iii) demonstration that targeted interventions, especially addressing N and irrigation co-limitations, can substantially improve yields, profitability, and N use efficiency relative to blanket recommendations. Future work should: (1) enhance models with richer biophysical and management data to capture more variance; (2) integrate emerging technologies and on-farm experimentation for validation and feedback; (3) operationalize targeting via simple, actionable rules of thumb (e.g., irrigations <4 and N <118 kg ha−1 as indicators of co-limitation) and decision-support tools; and (4) address socioeconomic and infrastructural bottlenecks to adoption to accelerate sustainable intensification at scale.
- Model explanatory power is moderate (state RF models explained 29–52% of yield variation), indicating unobserved or unmodeled factors and the need for richer datasets.
- Scenario analyses are ex-ante and assume practice adoption and consistent responses; real-world outcomes may vary across seasons and contexts.
- Emerging technologies and novel practices were not explicitly represented; on-farm research and validation remain essential.
- Spatial heterogeneity and district-level variability imply that generalized recommendations may not transfer without local calibration.
- Reliance on observational data may entail confounding not fully addressed by the modeling approach.
Related Publications
Explore these studies to deepen your understanding of the subject.

