logo
ResearchBunny Logo
The delayed and combinatorial response of online public opinion to the real world: An inquiry into news texts during the COVID-19 era

Social Work

The delayed and combinatorial response of online public opinion to the real world: An inquiry into news texts during the COVID-19 era

Y. Du, H. Cheng, et al.

Discover how online public opinion reacts over time to the COVID-19 pandemic in this fascinating study conducted by Yamin Du, Huanhuan Cheng, Qing Liu, and Song Tan. Using advanced techniques like natural language processing and machine learning, the research unveils surprising patterns of public sentiment that lag behind actual pandemic events.

00:00
00:00
~3 min • Beginner • English
Introduction
The study challenges the common assumption that online public opinion responds immediately to real-world events. Using the COVID-19 period as context, it investigates whether public opinion responses are delayed and combinational, reflecting higher-order network characteristics. The research questions are: (1) Does online public opinion exhibit temporal lag in responding to pandemic intensity? (2) Do multiple preceding events jointly shape public opinion and do single public opinion signals reflect composite effects of multiple prior events? The purpose is to enhance the validity of online opinion-based social surveys and to contribute theoretically to understanding how public opinion reflects reality, particularly amid the COVID-19 pandemic when online platforms became central to gauging societal responses.
Literature Review
Prior work recognizes online public opinion as a rich data source that can reflect societal dynamics (Schoen et al., 2013; Bollen et al., 2011; Stockmann & Luo, 2017). Many studies during COVID-19 leveraged online data to examine vaccine attitudes (Hu et al., 2021), retweetability and dissemination (Mahdikhani, 2022; Arbane et al., 2023; J. Liu et al., 2022), evolving emotions such as fear (Shi et al., 2022), and shifts in policy and economic preferences (Ferragina & Zola, 2022; Wei et al., 2023). However, most implicitly assume near-real-time responses, underestimating nonlinearities and higher-order dynamics (Scala & Delmastro, 2023). Theoretical perspectives on public opinion highlight its collective-yet-individual tension (Glynn & Huge, 2008; Lippmann, 2017) and its role in governance and political processes (Chase & Mulvenon, 2002; Neuberger et al., 2015). Recent advances in higher-order networks show multi-party interactions and temporality shape complex diffusion (Nie et al., 2022; 2023; Wang et al., 2024). This study addresses the literature gap by explicitly modeling lagged, combinational, and higher-order features in online opinion responses.
Methodology
Data: Online public opinion data comprise 26,127 news/self-media posts and comments from six high-follower Weitoutiao general media creators (five veteran self-media, one mainstream news outlet; each >3 million followers; average 6.039 million), from Jan 1, 2020 to Dec 31, 2022. COVID-19 intensity data (daily new cases and new deaths in China) are from WHO, covering 36 months and 1,094 effective days. Public opinion construction (LDA): After text cleaning (invalid character removal, Chinese word segmentation, stop-word removal), TF-IDF vectors are built. LDA topic modeling is run with candidate topic numbers K=8–25. Topic inheritance across K is assessed via cosine similarity and an inheritance threshold of 0.1 to observe topic evolution and stability. Perplexity suggests convergence around 8–13 topics; a final model with 16 topics is chosen based on topic coherence, uniqueness, and stability. Five topics (T7, T8, T13, T15, T16) relate to COVID-19: Epidemic Prevention & Social Impact; Epidemic, Policies & Economic Impact; Global Epidemic Situation; Epidemic Data Statistics & Prevention; Treatment of COVID-19. Daily public opinion is a 16-dim probability vector summing to 1; the COVID-19 opinion index aggregates the weights of the five COVID-related topics. Lagged Cross-Correlation Test (LCCT): To examine delayed relationships, opinion and COVID-19 intensity time series are aggregated mainly weekly (also validated at 10-day and monthly granularities). Pearson cross-correlations Rτ between COVID-19 intensity (new cases, new deaths) and lagged public opinion indices are computed across multiple lags τ. Significant intervals indicate lagged responses and potential multiple-response (waveform) patterns. Machine learning causal inference: To assess differences in sensitivity across time (years) and drivers (new cases vs new deaths), binary treatment setups compare groups (e.g., 2020 control vs 2021 treatment) using CausalML-style estimation of conditional average treatment effects. Base learners include XGBoost, Multilayer Perceptron, LGBM Regressor, XGB Regressor, and Random Forest Regressor. Average effects across models are reported in standard deviations of series. Analyses focus on the summed COVID-19 opinion index; topic-level sensitivity checks (e.g., T7, T16) are in appendices.
Key Findings
- LDA topic modeling stabilized at 16 topics; five were COVID-19-related. The proportion of COVID-19 topics stabilized around 25% once topics exceeded 25. - Clear lagged and combinational responses: significant cross-correlations emerged at multiple lags, indicating non-immediate, multi-interval effects. - Examples (weekly aggregation; significance: * p<0.05, ** p<0.01, *** p<0.001, **** p<0.0001): - Topic T7 (Epidemic Prevention & Social Impact) vs new cases: weak near-zero lags, but strong positive correlations at longer lags: lag 16 R=0.5997****, 17 R=0.6056****, 18 R=0.5535****, 27 R=0.5939****, 28 R=0.5928****, 29 R=0.5841****, 30 R=0.5299****, with significance persisting into the 30s of weeks. - Topic T8 (Epidemic, Policies & Economic Impact) vs new cases: positive correlations across many lags including near term (e.g., lag 0 R=0.2294**, 1 R=0.2019*, 2 R=0.2113**), with continued significance at longer lags (e.g., lag 21 R=0.4213****; peaks around lags 23–26 with R≥0.6195**** to 0.6915****). - Topic T13 (Global Epidemic Situation) vs new deaths: strong delayed correlations at lags 11–16 (e.g., lag 11 R=0.3070***; lag 12 R=0.2073*), and pronounced sustained significance from lags ~18 to mid-30s (e.g., lag 26 R=0.4014****; 27 R=0.5287****; 28 R=0.5235****; 33 R=0.5007****; 34 R=0.5839****; 35 R=0.5920****; 36 R=0.5877****). - Summed COVID-19 topics vs new cases: cumulative index shows rising significance beginning around lag 9 (R=0.2239**) and strong effects through lags 10–17 (e.g., 10 R=0.3185***; 11 R=0.4357****; 12 R=0.5178****; 13 R=0.5492****; 15 R=0.4830****; 16 R=0.4027****) and later intervals (e.g., 22 R=0.3732****; 23 R=0.4952****; 24 R=0.4731****; 25–27 R=0.4661**** to 0.4166****). - Robustness: Significant intervals observed weekly largely replicate at 10-day and monthly aggregations, reducing the likelihood of statistical coincidence. - Sensitivity differences (Table 5; values in SD units, averaged across ML models): - Across years (Panel A): Public opinion sensitivity to new cases decreased from 2020 to 2021 (−0.411 SD), then increased from 2021 to 2022 (+0.238 SD). Overall 2020 vs 2022 difference was −0.293 SD. For new deaths, year-to-year differences were smaller and trended slightly downward overall (2020 vs 2022: −0.114 SD). - New cases vs new deaths within years (Panel B): In 2020, public opinion was more sensitive to new cases (mean difference −0.209 SD indicates deaths elicited lower response than cases). In 2021 and over 2020–2022 combined, public opinion was relatively more sensitive to new deaths (2021 mean +0.304 SD; 2020–2022 mean +0.291 SD). - Overall: Online opinion responses form a waveform-like structure with multiple lags and cumulative effects; sensitivity varies by topic, driver (cases vs deaths), and year/stage of the pandemic.
Discussion
The findings directly address the research questions by demonstrating that online public opinion responds to real-world pandemic intensity with (1) significant temporal lags and (2) compositional, multi-interval effects whereby one real-world stimulus can elicit multiple subsequent opinion responses and a single opinion signal reflects multiple prior stimuli. This produces waveform-like patterns across topics. The heterogeneity across topics and between drivers (new cases vs deaths) highlights higher-order, nonlinear network characteristics in opinion formation. These patterns align with higher-order interaction theories in complex networks, where group-level and temporal structures shape diffusion and response pathways. Practically, recognizing these lags and combinational effects is essential for designing reliable online opinion-based surveys and for policy and communication strategies that anticipate delayed and cumulative public reactions.
Conclusion
The study constructs a COVID-19-era public opinion representation using LDA on Weitoutiao news/self-media text and links it to WHO COVID-19 intensity data. It reveals that online public opinion exhibits delayed, compositional, and waveform responses to real-world events, with sensitivity varying across pandemic stages and between drivers (new cases vs deaths). These findings enrich theory on public opinion dynamics, highlighting higher-order network interactions and nonlinearity. Practically, they caution researchers and decision-makers to account for lag structures and cumulative effects when interpreting online opinion. Future research should: (1) validate these patterns in other crises (e.g., climate, economic shocks, human rights issues), (2) explore neural-network and complex-network models for forecasting opinion evolution, and (3) examine analogous networked sentiment–market interactions in finance.
Limitations
- Context and data scope: Results stem from COVID-19 in China using six high-follower Weitoutiao general media sources; representativeness and media bias may limit generalizability to other platforms, countries, or event types. - LDA modeling constraints: Topic number selection relies on heuristics; outcomes depend on preprocessing quality; topic labeling involves researcher interpretation. - LCCT limitations: Cross-correlation is sensitive to data quality and series length; correlation does not prove causation despite safe time-series assumptions. - ML causal inference: Black-box models reduce interpretability; results can be sensitive to sample size, data quality, and algorithm choices, though multiple learners were used to mitigate instability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny