logo
ResearchBunny Logo
Introduction
Accurate box office revenue prediction is critical for mitigating market risks and enhancing competitiveness within the film industry. While prior research has established a correlation between eWOM variables (volume and valence) and product sales, the influence of contextual factors like review helpfulness remains under-explored. This study addresses this gap by examining the moderating role of review and reviewer helpfulness on the predictive power of eWOM for box office revenue. The Korean movie market, sourced from the Naver Movies website, provides the empirical context. The study leverages various business intelligence (BI) methods, including widely used statistical and non-statistical machine learning techniques, to compare prediction accuracy across different subsamples based on review/reviewer helpfulness. This approach offers a robust and efficient way to optimize predictive measures and fill a significant gap in the existing literature regarding the impact of review helpfulness on box office revenue prediction in international markets, particularly outside of the US.
Literature Review
Existing literature demonstrates a strong relationship between eWOM variables and product sales. eWOM influences consumer purchase decisions, repurchase intentions, and perceived risk. Variables like valence and volume consistently affect sales across diverse product categories. However, predicting box office revenue presents unique challenges due to the nonlinear effects of various factors, including eWOM and movie-specific characteristics. While previous studies have examined eWOM's impact on movie performance and the relationship between eWOM and review helpfulness, the predictive power of eWOM, moderated by review and reviewer helpfulness, remains largely unstudied. This study fills this research gap by employing machine learning methods to analyze the predictive performance of eWOM across different helpfulness subsamples. Prior research has explored various statistical and machine learning methods for box office prediction, including regression models, support vector regressions, neural networks, and Bayesian networks. This study builds on these approaches by specifically investigating the moderating effect of review helpfulness, a crucial aspect of review quality that influences consumer reliance on online reviews and subsequently impacts sales. The volatility of box office revenue necessitates a sampling strategy that enhances forecasting accuracy, and the study hypothesizes that higher review helpfulness will improve prediction performance.
Methodology
The study collected data from Naver Movies, the most popular Korean portal site, focusing on movies released between January 2014 and May 2016 (N=1798). Data included box office revenue (weekly for three weeks post-release), six eWOM variables (average number of reviews, average rating, average review extremity, average review length, average number of emotional reviews, average number of positive reviews), and seven movie-related control variables (star power, awards, sequels, release timing, genre, nationality). The box office revenue was discretized into a binary variable (top 20% vs. rest). The study employed four BI methods: random forest, boosted decision trees, k-nearest neighbor, and discriminant analysis. These methods were chosen to provide a comprehensive comparison of both widely used statistical and non-statistical learning techniques, allowing for an efficient assessment of prediction performance. For random forest and boosted decision trees, multiple models were trained and combined, using different tree structures or resampling to improve predictions. The k-nearest neighbor method identified the most similar movie in the training set to predict the class of the target movie, and discriminant analysis generated classification scores to assign each movie to a class. The study conducted a subsample analysis, creating high and low helpfulness groups based on the average helpfulness of reviews and reviewers. Multiple regression analysis was performed to determine adjusted R-squared values for each subsample, comparing the explanatory power of eWOM across high and low helpfulness groups. To ensure a fair comparison of prediction performance, the sample size for high and low helpfulness groups was standardized to the minimum size across all three time periods (weeks 1, 2, and 3). N-fold cross-validation was implemented to assess the stability of the results. Prediction error was measured using classification error, averaged across multiple validation samples.
Key Findings
Multiple regression analysis revealed higher adjusted R-squared values for high review/reviewer helpfulness subsamples compared to low helpfulness subsamples, indicating stronger explanatory power of eWOM in high helpfulness groups. The explanatory power of eWOM decreased over time, but the difference remained significant between high and low helpfulness groups. Review volume consistently showed a positive effect on revenue across all subsamples, while the effects of average review rating and extremity were less consistent, varying across time periods and helpfulness groups. The average review extremity had a negative impact on first-week revenue for both high and low helpfulness groups, suggesting that less extreme reviews may be more helpful for experience goods like movies. Machine learning results (Tables 9 and 10) generally confirmed the regression findings. The average prediction errors were significantly lower in the high review/reviewer helpfulness subsamples for most scenarios (except for some specific combinations of methods and time periods). This supports the hypotheses that higher review and reviewer helpfulness enhance the predictive power of eWOM for box office revenue. The findings are robust, holding across different machine learning algorithms and despite variations in sample sizes.
Discussion
The findings provide strong evidence for the moderating role of review and reviewer helpfulness in predicting box office revenue using eWOM. The consistently positive effect of review volume confirms its importance as an eWOM variable, even in the context of the Korean movie market. The nuances observed in the effects of review rating and extremity underscore the complexity of consumer response to online reviews. The improved prediction accuracy in high helpfulness subsamples highlights the significance of review quality in influencing the predictive power of eWOM. The use of diverse machine learning techniques strengthens the reliability of the results, suggesting that the observed effects are not limited to a specific methodological approach. The application to the Korean movie market extends the existing literature on eWOM's influence on movie performance beyond the primarily US-focused studies.
Conclusion
This study demonstrates that review and reviewer helpfulness significantly moderate the relationship between eWOM and box office revenue prediction. Movies with higher review helpfulness yield more accurate box office predictions. Future research could explore additional eWOM variables (e.g., concept count, writing styles), investigate interaction effects in various contexts (including other international markets), apply the model to other products, test different parameter settings, and incorporate additional movie-related variables (e.g., MPAA ratings, screen count, production budget). Using additional prediction performance metrics would further enhance the robustness of the findings.
Limitations
The study is limited to the Korean movie market, potentially limiting the generalizability of the findings to other cultural contexts. The specific selection of machine learning algorithms and parameter settings may also affect the results. While the study controls for several movie-related factors, additional variables could be included for a more comprehensive analysis. The discretization of the dependent variable (box office revenue) into a binary outcome could lead to information loss, and the use of a continuous dependent variable might provide further insights.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny