logo
ResearchBunny Logo
Introduction
The COVID-19 pandemic triggered the most significant global economic disruption of the 21st century. Travel restrictions, supply chain disruptions, and business closures posed immense risks across various industries. Governments implemented substantial financial support to mitigate the economic downturn, but effective policy responses necessitate reliable, real-time data on the pandemic's economic consequences. Existing approaches, such as macroeconomic research relying on past economic shocks and simulations, and computational social science methods using high-frequency data like stock prices or news articles, have limitations. Macroeconomic indicators often lag and provide an aggregated view, while high-frequency data can be noisy and prone to herd behavior, potentially detached from the reality faced by businesses. Furthermore, many metrics lack the granular industry-level analysis needed for targeted interventions. This study addresses these limitations by proposing a novel data source: the analysis of institutional language within US Securities and Exchange Commission (SEC) 10-K reports. These reports, mandatory for many companies and containing a risk assessment section, offer a forward-looking, risk-sensitive perspective. The forward-looking nature of these reports is supported by prior research indicating their predictive value for firm performance. The use of text-based data offers unique insights into risk perceptions. This paper evaluates the efficiency and effectiveness of the proposed CoRisk-Index by comparing it to existing metrics, highlighting its ability to anticipate unemployment developments and correlate with established risk indicators like the CBOE Volatility Index (VIX). Only one other study has attempted to quantify pandemic-related risk perceptions at the firm level; however, this paper uniquely focuses on the industry level, leveraging the standardized format and legal binding nature of 10-K reports for automated text mining.
Literature Review
The paper reviews two main approaches to studying the economic impact of COVID-19. Traditional macroeconomic research uses metrics based on past economic shocks and simulations to model the impact of the pandemic. Computational social science leverages alternative data sources such as stock market data, news articles, website content, search queries, and trade statistics. However, both approaches have shortcomings. Macroeconomic indicators often have a significant time lag and offer only an aggregated picture of the crisis, whereas high-frequency data sources like stock market prices are susceptible to noise and herd behavior and might not reflect the actual economic conditions faced by companies. The authors highlight a gap in existing studies that focus on either macroeconomic indicators or high-frequency data, lacking a granular industry-level analysis of business risk perceptions. This gap is filled by utilizing SEC 10-K filings, which are forward-looking risk assessments, offering a reliable and granular view of industry-specific risk perceptions.
Methodology
The CoRisk-Index is constructed using data-mining techniques on all 10-K reports filed to the SEC since January 30, 2020, encompassing companies representing more than a third of the US workforce. The index combines two measures: (a) the number of 'corona' words in each report and (b) the average text negativity of sentences mentioning 'corona' within each industry. The text negativity is calculated using the Loughran and McDonald (2011) sentiment dictionary. The geometric mean of these two measures forms the CoRisk-Index for each industry. To understand the context of the reported risks, the researchers apply natural language processing (NLP) and topic modeling (Latent Dirichlet Allocation or LDA) to identify the topical context of Covid-19 related risk factors. The LDA model is used exploratively to identify potential topics, which are further refined using a dictionary-based approach, combining algorithmic findings with expert knowledge in economics to ensure robustness and comparability. The researchers analyze eight industries well-represented in the SEC filings, matching the SEC's industry classification system (SIC) with the North American Industry Classification System (NAICS) to enhance comparability with existing economic data. A 14-day moving average is used to smooth the daily CoRisk-Index values. The effectiveness and generalizability of the CoRisk-Index are evaluated using Granger causality tests based on vector autoregression (VAR) models, examining the relationship between the CoRisk-Index, unemployment rates (both overall and industry-specific), the S&P 1200 Global Index, and the VIX volatility index. The stationarity of the time series is confirmed using the KPSS test. The historical robustness of the approach is assessed by comparing text negativity in 10-K reports with macroeconomic variables (GDP growth and unemployment) from 2000 to 2018. The methodology is also demonstrated to be generalizable beyond the COVID-19 pandemic by analyzing the text negativity of 'china' sentences in SEC filings during the US-China trade war of 2018.
Key Findings
The CoRisk-Index effectively captures the evolution of industry-specific risk perceptions related to COVID-19. The number of 'corona' keywords in 10-K reports began rising before the first wave of infections in the US, showing an oscillating pattern reflecting waves of economic concern. Similarly, text negativity spiked before the stock market crash in February 2020, peaking just before the most severe market losses. The CoRisk-Index reveals significant differences in the timing and magnitude of risk awareness across industries. For instance, manufacturing, wholesale & retail, and professional & business services showed an early and steep increase in the index, while leisure & hospitality, finance, and transportation & utilities exhibited a later but steady rise. The topic-specific analysis using NLP reveals variations in the specific concerns of different industries. Travel restrictions affected the transportation & utilities sector most strongly. Supply chain disruptions impacted manufacturing, while the demand shock hit wholesale & retail most severely. The financial implications of the crisis were differentially perceived, with the finance sector showing early concern, while the mining sector focused more on the financial implications during the second wave of the pandemic. Governmental aid also impacted industries differently. The Granger causality tests show that the CoRisk-Index significantly predicts unemployment rates and correlates with the VIX and S&P 1200 Global Index, demonstrating its predictive power. The historical analysis (2000-2018) confirms the correlation between text negativity in 10-K reports and macroeconomic indicators, supporting the validity of the approach. The study also shows that the methodology is generalizable to other economic events, as demonstrated by the analysis of the US-China trade war of 2018.
Discussion
The CoRisk-Index fills a crucial gap in economic data, providing granular, industry-specific insights into business risk perceptions that are not captured by traditional macroeconomic indicators or high-frequency data sources. The index's ability to anticipate unemployment and correlate with market volatility demonstrates its effectiveness. The granular, topic-specific data highlight the heterogeneous impact of the pandemic on different industries, supporting the need for targeted policy interventions. The index can inform policymakers on the timing and nature of economic support packages tailored to the specific needs of different sectors. For example, the CoRisk data could help identify industries facing immediate supply chain disruptions or those needing support for remote work infrastructure. The CoRisk-Index also assists in understanding the sequence of policy interventions, identifying when various industries most need support. The findings underscore the importance of using alternative data sources and methods for understanding economic crises in real-time.
Conclusion
The CoRisk-Index offers a valuable new tool for analyzing industry-specific risk perceptions during economic crises. Its ability to predict unemployment, correlate with market volatility, and provide granular insights into industry-specific concerns makes it a powerful addition to economic forecasting and policymaking. Future research could expand the index's geographical scope by developing comparable datasets for other countries and exploring alternative text-based data sources to enrich the analysis and broaden the understanding of various aspects of the pandemic's impact on businesses. Further research may involve exploring more sophisticated NLP techniques to improve the accuracy and granularity of the findings. The development of predictive models for economic shocks incorporating the CoRisk-Index could also provide valuable insights and help improve the efficiency and effectiveness of economic support schemes during crises.
Limitations
The study is limited to US companies reporting to the SEC through 10-K filings. This limits the geographical scope and relies on self-reported data from companies, which might be influenced by factors like stock price volatility or strategic risk disclosure choices. The topic modeling approach relies on the selection of keywords, which involves some degree of subjectivity although mitigated by a combination of unsupervised methods and human expert knowledge. The index currently focuses on specific keywords and does not fully capture all the nuances of the pandemic's impact on firms. The accuracy of text sentiment analysis can also be influenced by the limitations of sentiment dictionaries. While the paper addresses these limitations and includes historical validation, further research is needed to refine the methodology and expand the dataset.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny