logo
ResearchBunny Logo
Introduction
Existing research on online public opinion often assumes an immediate reflection of real-world events. This study challenges this assumption by investigating the delayed and combined nature of online public opinion responses, specifically focusing on the COVID-19 pandemic in China. The research acknowledges the inherent complexity of public opinion, defined as the aggregation of individual opinions reflecting collective sentiment. The internet, particularly platforms like Weitoutiao, provides a rich data source for studying this phenomenon. While previous studies examined online public opinion during the pandemic, focusing on aspects like vaccine acceptance, tweet retweetability, public fear, and economic impacts, they often neglected the temporal delays and combinatory effects in opinion formation. This study addresses this gap by examining the delayed temporal response of public opinion to COVID-19 intensity and the collective influence of multiple preceding events on online public opinion. The study's practical significance lies in improving the quality of public opinion surveys, which directly impact real-world decisions based on public sentiment. By understanding the temporal lag and combinatory dynamics, researchers can obtain more accurate and reliable public opinion distributions.
Literature Review
The introduction extensively reviews existing literature on public opinion, its evolution in the digital age, and its relevance to various fields. It highlights studies focusing on sentiment analysis, opinion mining, and the use of online data to understand public reactions to real-world events, particularly during the COVID-19 pandemic. The literature review underscores the limitations of previous studies, which often assumed immediate feedback of public opinion on real-world events, thus underestimating the complex nonlinear response dynamics. The authors emphasize the need for exploring the higher-order characteristics of online public opinion responses for accurate social surveys.
Methodology
The study employs a three-stage research framework. Stage one involves constructing a public opinion index using Latent Dirichlet Allocation (LDA) topic modeling. Data consists of news and self-media texts from Weitoutiao, a Chinese social media platform, from January 1, 2020, to December 31, 2022, from six key information creators with over 3 million followers each. COVID-19 intensity data comes from the World Health Organization. Data preprocessing includes cleaning, Chinese word segmentation, and stop word removal. TF-IDF vectors are created, and LDA is used to identify COVID-19 related topics. The optimal number of topics (16) was determined by analyzing perplexity values. Stage two uses the Lagged Cross-Correlation Test (LCCT) to measure significant intervals between COVID-19 intensity (new cases and deaths) and the constructed public opinion index. The LCCT assesses the correlation with various lags to identify delayed responses. The analysis was repeated at weekly, monthly, and ten-day intervals to ensure the robustness of findings. Stage three applies machine learning causal inference methods (XGBoost, MLP, LGBM Regressor, XGB Regressor, and Random Forest Regressor) using the Causal ML package to quantify differences in public opinion response sensitivity to COVID-19 intensity across different time periods (2020, 2021, 2022). Cosine similarity is used to assess topic inheritance relationships in the LDA model. The study uses binary treatment of control and treatment groups for causal inference, comparing causal intensities across different years and input features (new cases, new deaths).
Key Findings
The study found that online public opinion's response to COVID-19 intensity is not immediate but exhibits a long-term lag. The analysis revealed a complex network pattern, where a single COVID-19 data point may influence multiple delayed public opinion responses, and a single public opinion response may be a composite effect of multiple preceding COVID-19 data points. This complex relationship results in a waveform structure of public opinion responses. The LCCT analysis, performed at different granularities (weekly, monthly, and 10-day intervals), confirmed the presence of significant delayed correlations. Machine learning causal inference showed that the sensitivity of public opinion to COVID-19 intensity varied across different years. Public opinion was most sensitive to new COVID-19 cases in 2020, the first year of the pandemic, with decreasing sensitivity in subsequent years. The sensitivity to new deaths showed less variation across years but also indicated a trend toward decreased sensitivity. Further, the study found that public opinion was more sensitive to new cases in 2020, shifting towards greater sensitivity to new deaths in 2021 and 2022. The findings highlight the heterogeneity in the impact of COVID-19 intensity on different aspects of public opinion and the differing effects of new cases versus new deaths.
Discussion
The findings support the hypothesis that online public opinion does not respond immediately to real-world events but exhibits delayed and composite responses. The identified waveform structure and varying sensitivities across different time periods underscore the complexity of this relationship. The study's findings align with recent research on higher-order network interactions, which demonstrates the significance of complex multi-party interactions in various systems. The delayed and combinatorial responses observed in online public opinion are consistent with the characteristics of higher-order interactions. The research has implications for improving the accuracy of public opinion surveys and emphasizes the limitations of traditional models that assume immediate and linear responses.
Conclusion
This study demonstrates that online public opinion responses to real-world events like the COVID-19 pandemic are characterized by delays, combinatory effects, and waveform structures. This complexity necessitates a re-evaluation of methodologies in online public opinion research, advocating for an awareness of temporal lags and composite influences. Future research could explore similar patterns in other crisis contexts, develop neural network-based prediction models for public opinion, and investigate the network structures of investor sentiment in relation to financial markets.
Limitations
The study's limitations include the specific context of COVID-19, the potential bias from using data from six large media sources, and the inherent limitations of LDA topic modeling (topic number selection, text preprocessing quality, and subjective topic interpretation). Furthermore, the limitations of machine learning causal inference models (black-box nature, model complexity, and computational resource requirements) are acknowledged. The limitations of the Lagged Cross-Correlation Test (dependence on data quality and time series length) are also discussed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny