Transportation

Bayesian estimation of mixed multinomial logit models: Advances and simulation-based evaluations

P. Bansal, R. Krueger, et al.

Discover how Variational Bayes (VB) methods provide a faster and more efficient alternative to traditional Markov chain Monte Carlo (MCMC) methods in estimating mixed multinomial logit models. This groundbreaking research by Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, and Taha H. Rashidi reveals enhancements to VB methods and compares their performance with MCMC and MSLE, showing significant speed advantages.

00:00

Playback language: English

Index

Introduction

The mixed multinomial logit (MMNL) model is a widely used model for analyzing and predicting individual choice behavior across various fields like economics, health, marketing, and transportation. While maximum simulated likelihood estimation (MSLE) is the dominant estimation method, the Bayesian approach offers advantages, providing the entire posterior distribution of model parameters, including individual-specific parameters. Markov chain Monte Carlo (MCMC) methods are typically used for Bayesian inference in MMNL models, but they suffer from scalability issues due to long computation times, high storage costs, and convergence difficulties. Variational Bayes (VB) methods offer a promising alternative by framing approximate Bayesian inference as an optimization problem instead of a sampling problem. VB aims to find a parametric variational distribution that minimizes the probability distance between the exact posterior and the variational distribution. A key challenge in applying VB to MMNL models is the intractability of the expectation of the log-sum-of-exponentials (E-LSE) term. Existing VB methods for MMNL models have limitations: they focus solely on individual-specific utility parameters, and their finite-sample properties and relative performance against MCMC and MSLE remain unclear. This paper addresses these limitations by extending existing VB methods to incorporate both fixed and random utility parameters and by conducting a comprehensive simulation-based evaluation to compare the extended VB methods with MCMC and MSLE in terms of estimation time, parameter recovery, and predictive accuracy. The inclusion of fixed utility parameters is crucial in practice, as it allows for the incorporation of alternative-specific constants (ASCs) and interaction effects between alternative attributes and individual-specific characteristics, providing parsimonious representations of systematic taste variations and avoiding potential identification issues.

Literature Review

Existing studies using VB for MMNL model estimation show VB's speed advantage over MCMC with negligible loss in predictive accuracy. However, these studies lack comprehensive comparisons. They often focus on either quasi-Newton (QN) or nonconjugate variational message passing (NCVMP) updating strategies without comparing their relative performance. Furthermore, none directly compare VB to the widely used MSLE, and only predictive accuracy is evaluated, neglecting the important aspect of parameter recovery. Finally, a significant limitation is the restriction to MMNL models with only individual-specific utility parameters, despite the practical significance of including fixed utility parameters in many applications.

Methodology

The paper presents a fully Bayesian formulation of the MMNL model, detailing the generative process and the joint distribution of data and model parameters. The intractable posterior distribution is approximated using both MCMC and VB methods. The standard MCMC approach employs a blocked Gibbs sampler with Metropolis-Hastings steps for updating parameters. For VB, a mean-field variational distribution is assumed, factorizing the distribution into independent variational factors for each parameter. The optimal densities of the variational factors are derived. However, due to the E-LSE term's intractability, several approximation methods are employed: the delta method (a second-order Taylor series expansion), quasi-Monte Carlo (QMC) integration (a simulation-based method), and an alternative variational lower bound derived using the modified Jensen's inequality. Updates for the nonconjugate variational factors (α and βn) are performed using either QN methods or NCVMP. The paper extends existing methods to handle both fixed and random utility parameters. A simulation study is used to evaluate these methods. Data is generated from a semi-synthetic data generating process (DGP) based on a real stated choice experiment, varying sample sizes (N = 500, 2000) and number of choice occasions (T = 5, 10). Multiple scenarios are considered with varying correlations among individual-specific parameters and the inclusion or exclusion of fixed parameters. Performance is assessed using root mean square error (RMSE) for parameter recovery and total variation distance (TVD) for predictive accuracy. Estimation is implemented using custom Python code.

Key Findings

The simulation study reveals several key findings. First, the RMSE for all parameters (α, ζ, Ω, and β1:N) decreases as the sample size (N) and the number of choice occasions (T) increase, demonstrating the consistency of the VB methods. Second, all methods perform similarly well in recovering the mean vector (ζ) and individual-specific parameters (β1:N) across all scenarios. Third, the methods also recover the fixed parameters (α) similarly well when included in the model. Fourth, excluding the methods relying on the modified Jensen's inequality (MJI) lower bound, all methods perform equally well in recovering the covariance matrix (Ω). The MJI-based methods show substantially larger RMSE values for Ω, particularly as the sample size increases. Fifth, all methods except those using the MJI bound display comparable predictive accuracy. The MJI-based methods show reduced predictive accuracy, attributed to the less accurate recovery of Ω. Sixth, and most importantly, NCVMP-based VB updates are significantly faster than QN updates with little to no compromise in parameter recovery or prediction accuracy. Specifically, VB-NCVMP-Δ shows superior performance, being 1.7 to 16.2 times faster than MCMC and MSLE while maintaining comparable parameter recovery and predictive accuracy. There were no convergence issues observed for the delta-method approximation, contradicting some previous findings.

Discussion

The findings directly address the research questions by demonstrating the effectiveness and efficiency of extended VB methods for MMNL models that include both fixed and random utility parameters. The results highlight VB-NCVMP-Δ as a strong alternative to both MCMC and MSLE, especially for large datasets where computational speed is critical. The superior performance of VB-NCVMP-Δ relative to other VB variants indicates that the combination of NCVMP and the delta method is a more robust and efficient approach than other methods for handling the E-LSE term. The relatively poor performance of the MJI-based methods suggests that other lower bound approximations can be more accurate and efficient. These findings support the use of VB methods for various discrete choice modeling applications, particularly those requiring fast inference and prediction. The near-identical predictive accuracy compared to MCMC and MSLE but with substantial speed improvements is a key advantage.

Conclusion

This paper successfully extends VB methods to incorporate both fixed and random utility parameters in MMNL models. The extensive simulation study shows that VB-NCVMP-Δ offers a significant computational advantage over MCMC and MSLE without sacrificing accuracy. Future research directions include extending VB methods to non-normal mixing distributions, applying them to more complex discrete choice models (e.g., integrated choice and latent variable models), and developing online estimation procedures. Further explorations into alternative probability divergences and variational structures could improve the accuracy and efficiency of VB estimations. Comparative analysis with frequentist methods like MACML is also warranted.

Limitations

The study primarily focuses on MMNL models with normal mixing distributions and linear-in-parameters utility specifications. The generalizability to other model structures may require further investigation. The semi-synthetic data generating process, while realistic, is based on a specific stated choice experiment. Different data characteristics could lead to different outcomes. The computation time and performance evaluations depend on the specific implementation and computing environment. Although an effort was made to standardize the implementations, minor variations might exist.

Related Publications

Explore these studies to deepen your understanding of the subject.

Economics

What dictates income in New York City? SHAP analysis of income estimation based on Socio-economic and Spatial Information Gaussian Processes (SSIG)

R. Bai, J. C. K. Lam, et al.

Mathematics

Practical parameter identifiability and handling of censored data with Bayesian inference in mathematical tumour models

J. Porthiyas, D. Nussey, et al.

Medicine and Health

Systematic review of economic evaluations for internet- and mobile-based interventions for mental health problems

F. Kählke, C. Buntrock, et al.

Political Science

Performance and biases of Large Language Models in public opinion simulation

Y. Qu and J. Wang

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny