
Business
Decoding consumer purchase decisions: exploring the predictive power of EEG features in online shopping environments using machine learning
Z. Xu and S. Liu
This groundbreaking study, conducted by Zhiwei Xu and Siqi Liu, explores how machine learning and EEG data can accurately predict consumer purchasing decisions in online shopping. Using advanced algorithms, the research reveals the critical role of brain activity in shaping buying behavior.
~3 min • Beginner • English
Introduction
The study asks what factors drive a consumer’s purchase decision when viewing a product page, noting limitations of traditional questionnaires such as social desirability, conscious control over subconscious responses, and the mere measurement effect. To address these, the authors leverage EEG to capture neural activity and machine learning to model complex, high-dimensional patterns. They aim to predict purchase versus non-purchase decisions in a realistic online shopping context, moving beyond lab simulations to enhance ecological validity and provide practical neuromarketing insights.
Literature Review
Theoretical background situates EEG-based emotion and decision recognition within supervised and unsupervised machine learning frameworks. EEG data are high-dimensional, noisy, and non-stationary, necessitating robust algorithms (e.g., SVM, RF, KNN, neural networks). Common EEG features include PSD, entropy, prefrontal asymmetry, and connectivity. The paper focuses on: (1) Power spectral density (PSD), which maps power across delta, theta, alpha, beta, gamma bands and is effective for differentiating emotional and cognitive states; and (2) Prefrontal asymmetry index (PAI), reflecting differential frontal activation associated with approach–withdrawal tendencies and predictive of purchase inclination. Prior neuromarketing work often used lab simulations; this study extends it via a field experiment capturing real transactions to improve ecological validity.
Methodology
Design: Field experiment simulating authentic online shopping decisions. Participants navigated an online platform to decide on five products previously saved to their carts; real purchases and refusals occurred, with EEG recorded continuously. Actions (purchase vs. non-purchase) were event-marked.
Participants: 73 recruited undergraduates; after screening for financial homogeneity, 66 right-handed participants (19–24 years; mean 20.6 ± 1.4) remained. Informed consent obtained; ethics approved (Hubei University, 2024042201).
Stimuli/Trials: 328 product page views across six categories (clothing/footwear, daily necessities, cosmetics, food/snacks, electronics, sports). Average browsing time 38.5 s. Each participant required to purchase at least one item and refuse at least one.
EEG acquisition: Emotiv EPOC+ (14 channels + 2 mastoids), 256 Hz. Electrode sites: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4. Wireless recording via EmotivPro; channel quality threshold 95%.
Preprocessing: EEGLAB/MATLAB. Bandpass 1–100 Hz; notch 48–52 Hz. Segmenting into 2-s epochs. Manual rejection/interpolation of bad segments. ICA to remove blink/eye/muscle artifacts. Data segmented into baseline (eyes open, relaxed) and experimental periods per item. For each subject: baseline, purchase-decision, refusal-decision datasets created.
Feature extraction: PSD computed via Fourier transform for 14 channels across bands: Delta (1–3 Hz), Theta (4–7 Hz), Alpha1 (8–10 Hz), Alpha2 (11–13 Hz), Beta1 (14–18 Hz), Beta2 (19–30 Hz), Gamma (31–80 Hz). PAI computed using frontal pairs (e.g., AF3–AF4, F7–F8, F3–F4) across bands; differential relative to baseline defined as ΔPAI = (log(AF3/AF4)_exp − log(AF3/AF4)_base) / (log(AF3/AF4)_exp + log(AF3/AF4)_base) (representative for AF3–AF4). Final feature set combined 98 PSD features (across channels/bands) and 21 PAI features, totaling 119 features. Data arranged as samples × features.
Classifiers: Traditional ML included Random Forest (grid-search tree depth 3–15; 50–200 trees), KNN (distance metrics: Euclidean, Manhattan, Weighted, Chebyshev; optimal K tuned), and SVM (linear and Gaussian kernels) using libsvm. Dimensionality reduction/feature selection: correlation-based feature selection (CFS), T-value, F-score, PCA, and recursive feature elimination (RFE). Evaluation used Leave-One-Out Cross-Validation (LOOCV). Statistical significance via 1000-label permutation tests; metrics: accuracy, sensitivity, specificity, AUC. Consensus features noted across LOOCV folds.
Shallow Neural Network (SNN): After selecting the top 10% ranked features from traditional ML, a shallow NN was trained (MATLAB 2021, Deep Learning Toolbox). Architecture: input layer; two fully connected layers (64 and 32 ReLU units); softmax output (2 classes). Training with Adam optimizer: 50 epochs, mini-batch size 10, initial LR 0.001; shuffling each epoch. Performance assessed with 5-fold cross-validation; mean metrics reported.
Key Findings
- Overall feasibility: EEG features (PSD + PAI; 119 total) can predict purchase vs. non-purchase decisions with accuracies >70% across validated models.
- Best-performing model: SVM with Gaussian (RBF) kernel achieved the top accuracies across feature selection methods, up to 87.1%.
• T-value + Gaussian SVM: Accuracy 0.871; Specificity 0.848; Sensitivity 0.939; AUC 0.873; P=0.02.
• F-score + Gaussian SVM: Accuracy 0.871; Specificity 0.849; Sensitivity 0.939; AUC 0.874; P=0.01.
• Correlation + Gaussian SVM: Accuracy 0.848; Specificity 0.773; Sensitivity 0.955; AUC 0.827; P<0.01.
• RFE + Gaussian SVM: Accuracy 0.841; Specificity 0.849; Sensitivity 0.833; AUC 0.849; P=0.03.
• Linear SVMs performed poorly (accuracies ~0.326–0.402; high P-values), highlighting the importance of modeling nonlinearity.
- Random Forest: Best configuration depth=15, trees=50: Accuracy 0.705; Specificity 0.606; Sensitivity 0.652; AUC 0.629; P<0.01.
- KNN: Many high-accuracy settings failed permutation tests (suggesting overfitting). Two weighted-distance configurations passed:
• T-value + Weighted: Accuracy 0.788; Spec 0.803; Sens 0.773; P<0.01.
• Correlation + Weighted: Accuracy 0.773; Spec 0.833; Sens 0.712; P<0.01.
PCA-based KNN and other distances often showed high apparent accuracy (up to 0.962) but did not reach significance (P≥0.70–0.99).
- SNN (5-fold CV): Mean Accuracy 66.84%; Sensitivity 66.92%; Specificity 66.70% (fold-wise accuracy range ~44.44%–84.62%), indicating moderate performance given data size.
- Neuroscientific insights: Key discriminative features arise from frontal (prefrontal asymmetry; PAI AF3–AF4, F7–F8) and occipital regions (PSD O1, O2), implicating approach–withdrawal motivation and visual processing/attention in purchase decisions.
Discussion
Findings show that nonlinear classifiers, especially SVM with Gaussian kernels, capture complex relationships in EEG features essential for predicting purchase decisions, outperforming linear SVM and matching or exceeding traditional models. KNN only generalized under specific feature selection and distance choices; many configurations overfit. Random Forest achieved statistically significant but lower accuracy. The consistent prominence of frontal PAI features indicates prefrontal involvement in approach–withdrawal motivation and affective-cognitive integration during purchasing. Occipital PSD features reflect visual processing and attention allocation central to evaluating online product pages. The moderate performance of the shallow neural network underscores the constraints of limited sample size and the risk of overfitting in deeper architectures. Practically, integrating EEG-derived PSD and PAI with robust ML can improve preference and purchase prediction beyond self-reports, offering guidance for page design and ad selection. Methodologically, permutation testing proved crucial to guard against overfitting, as several superficially high-accuracy models failed significance tests.
Conclusion
EEG-derived PSD and PAI features enable accurate prediction of online purchase decisions, with Gaussian-kernel SVMs achieving up to 87.1% accuracy. Discriminative signals concentrate in frontal (prefrontal asymmetry) and occipital (visual/attention) regions, advancing understanding of the neural substrates of consumer choice. The study emphasizes selecting appropriate nonlinear models and rigorous validation to avoid overfitting. These insights can inform neuromarketing strategies and the design of more engaging shopping experiences. Future work should expand feature sets and data volume, employ higher-density EEG and deep learning where feasible, and further probe how specific marketing stimuli and brand-related factors modulate neural decision processes.
Limitations
- Feature scope: Did not include brain network connectivity, microstates, or nonlinear dynamics due to equipment and processing constraints.
- Hardware constraints: 14-channel EEG may insufficiently capture complex cortical and subcortical processes, potentially limiting generalizability and depth of inference.
- Data size: 328 trials adequate for traditional ML but limited for deep learning; risk of overfitting in more complex models.
- Deep learning: CNNs and other deep architectures were not applied due to insufficient data and computational resources.
- Future directions: Use higher-density EEG, larger datasets, and deep learning to capture richer spatiotemporal patterns; examine effects of brand appeal and diverse marketing stimuli; address class imbalance via synthetic augmentation, class weighting, or advanced resampling.
Related Publications
Explore these studies to deepen your understanding of the subject.