logo
Loading...
An exploratory look at supermarket shopping paths

Business

An exploratory look at supermarket shopping paths

J. S. Larson, E. T. Bradlow, et al.

Dive into groundbreaking insights on grocery shopping behavior with our analysis of unique shopper paths revealed through RFID technology. This research, conducted by Jeffrey S. Larson, Eric T. Bradlow, and Peter S. Fader, unveils 14 distinct shopping patterns that challenge traditional beliefs about consumer navigation in stores.... show more
Introduction

The paper addresses fundamental questions about how shoppers actually travel within a supermarket: whether they traverse every aisle or move directly between areas, how much they use the store perimeter (“racetrack”), and whether shoppers follow a dominant pattern or display heterogeneity. Despite common assumptions about up-and-down aisle shopping, there has been little empirical study of real in-store path behavior. Leveraging a novel RFID-based dataset that tracks shopping cart locations every 5 seconds, the authors aim to conduct exploratory analyses to identify typical in-store travel behaviors. The goals are to summarize and cluster paths into canonical types while respecting physical store constraints, and to use the findings to stimulate future, more formal research into shopping behavior.

Literature Review

The authors position their work relative to prior approaches that summarize behavioral curves using principal components analysis (e.g., Bradlow, 2002; Jones & Rice, 1992), noting their aim is clustering into shopper types rather than explaining variance. They also connect to earlier, sporadic research on in-store spatial behavior: Farley and Ring (1966) modeled zone-to-zone transitions; Mackay and Olshavsky (1975) and Park, Iyer, and Smith (1989) examined perceptions, store knowledge, and time constraints. Underhill’s Why We Buy (1999) provides anthropological observations but limited analytical depth. Related environmental psychology and spatial movement research (e.g., Batty, 2003; Winkel & Sasanoff, 1966; Lynch & Rivkin, 1959) explores pedestrian flows and cognitive mapping. Unlike these, the present study focuses on complete shopper paths with high-frequency positional data and introduces a clustering method that enforces spatial feasibility of canonical paths.

Methodology

Data: Sorensen Associates’ PathTracker system affixed RFID tags to all carts in a western U.S. supermarket. Tags emitted signals every 5 seconds, triangulated to (x, y) coordinates (“blinks”), yielding full cart paths, including checkout endpoints. Initial biases due to electromagnetic variation were corrected via calibration. Paths longer than 2 hours (e.g., abandoned carts) were excluded. Sample: From ~27,000 paths (lengths 25–1500 blinks; mean 205 blinks ~16+ min; median 166 blinks ~13+ min), a systematic sample of 9000 was drawn (every 3rd path from a random start). After cleaning, 8751 paths remained. Ragged arrays: To compare paths of different lengths, each path was resampled to 100 percentile locations. Blink 1 and 100 match the actual start and end; intermediate blinks represent equally spaced percentiles of cumulative distance. Spatial constraints and clustering: Standard k-means on raw coordinates would yield infeasible centroids (paths crossing shelves or inaccessible zones). The authors implement k-medoids clustering (Kaufmann & Rousseeuw, 1990) adapted to spatial constraints: start with randomly selected observed paths as medoids; assign paths by minimizing Euclidean distance across the 100 aligned percentile coordinates; compute the naïve k-means centroid (pointwise mean); then select, within each cluster, the observed path closest (in sum of squared Euclidean distances across blinks) to this centroid as the new medoid, ensuring feasibility since medoids are actual paths. Euclidean distance is used due to high correlation (.90–.99) with network travel distance and ease/clarity for this application. Model selection: For each time group (below), cluster solutions were evaluated across multiple K using scree plots of within-cluster error and the Krzanowski–Lai (KL) statistic, KL(k)=|DIFF(k)/DIFF(k+1)| where DIFF(k)=(k−1)^2/p·W_{k−1} − k^2/p·W_k, with p=200 (x and y at 100 locations) and W_k the within-cluster error. Each K solution was run from 20 random starts to mitigate local minima. Time stratification: Because clustering across all paths was dominated by path length, the 8751 paths were split into three equal-sized groups by duration: Low (2–10 min; n=2917), Medium (10–17 min; n=2916), High (17–~120 min; n=2918). Clustering was performed separately within each group. Zone profiling: For interpretability, each path was summarized by percent of travel in six mutually exclusive zones: Racetrack (perimeter thoroughfare), Aisles, Produce, Convenience Store (C-Store), Checkout, and Extremity (outer shelving). These profiles aid interpretation but are not used for clustering to avoid loss of sequence and precise location information. Cross-validation: The clustering pipeline (model selection and medoids) was replicated on an additional one-third sample to assess stability of K and canonical path types. Algorithmic details are provided in an appendix.

Key Findings
  • Canonical path types: Time-based clustering yielded 14 canonical medoid paths: Low (2 clusters), Medium (4), High (8). These medoids are actual observed paths, ensuring feasibility and interpretability.
  • Low-duration trips (2–10 min; n=2917): K=2 was selected. One cluster follows a default start path along the racetrack to the right of the produce/office zone; the other breaks this default, heading directly to target aisles/areas. Cluster sizes: 1772 and 1145. Profiles differ notably in Racetrack and Produce usage; no difference in total path length within this time band.
  • Medium-duration trips (10–17 min; n=2916): K=4 was selected. All four medoids follow the default start and continue on the racetrack, but differ in racetrack vs aisle emphasis and timing: • Cluster 1 (n=732) and Cluster 3 (n=541) show more racetrack shopping; Cluster 1 covers more of the racetrack, Cluster 3 spends more time within a smaller racetrack segment. • Cluster 2 (n=1032) is aisle-dominated, using the racetrack to reach target aisles quickly (little Produce time). • Cluster 4 (n=611) shows extended time in Checkout (potentially slow cashier/socializing/impulse shopping). Clusters 2 and 4 traverse Produce quickly.
  • High-duration trips (17–~120 min; n=2918): K=8 was selected. High heterogeneity observed: • Clusters 4 and 5 are aisle-dominant but focus on specific subsets of aisles (not all 12), refuting the myth of complete aisle-by-aisle traversal. • Cluster 3 heavily uses the Convenience Store zone (includes a Chinese takeout counter), explaining long durations with limited exploration elsewhere. • Clusters 1 and 6 use the racetrack as a shopping venue with quick aisle excursions (enter/exit same side), highlighting the importance of end-caps. • Cluster 2 spends much time in specific racetrack segments (not the whole perimeter), with high Produce proportion; profiling alone would overstate racetrack coverage without sequence-aware path analysis. • Clusters 7 and 8 exhibit notable backtracking (non-forward progress), unlike other clusters that move monotonically toward checkout.
  • Myth-busting insights: • Systematic up-and-down aisle shopping is rare. Shoppers typically visit select aisles; full-aisle traverses occur but are less common than quick excursions. • The racetrack is central: often used as a home base for movement and shopping, not merely transitions between aisles. • End-cap relevance: racetrack-with-excursions patterns imply elevated exposure to end-of-aisle merchandising.
  • Data/statistics context: • Original corpus ~27,000 paths; analytic sample 8751 after cleaning from a systematic 9000-sample. • Path lengths: 25–1500 blinks; mean 205 blinks (~16 min), median 166 blinks (~13 min). • Time-group splits: Low 2–10 min (n=2917); Medium 10–17 min (n=2916); High 17–~120 min (n=2918). • Euclidean vs network distances are highly correlated (.90–.99), justifying Euclidean distance for clustering.
  • Cross-validation: Replication on an additional third of data recovered the same K per time group (2, 4, 8) and highly similar canonical medoids and zone-profile patterns, indicating stability.
Discussion

The study demonstrates that actual in-store travel patterns are heterogeneous and often contravene common assumptions. By enforcing spatial feasibility through k-medoids and aligning paths by percentiles, the clusters reveal how and where shoppers move, not just how much time they spend in areas. Findings show the racetrack’s centrality for both movement and shopping, selective aisle usage, and patterns like quick aisle excursions and occasional backtracking. These insights answer the motivating questions: shoppers do not typically traverse every aisle; they rely on the perimeter to navigate and shop; and there is no single dominant pattern—multiple canonical types exist, shaped by time in store. From a managerial perspective, the results suggest reassessing product placement (especially end-caps and aisle ends), considering racetrack merchandising given its exposure, and tailoring strategies to trip types (e.g., short trips emphasizing perimeter/convenience zones). The approach provides a framework for diagnosing store utilization and informing layout and merchandising decisions.

Conclusion

This paper introduces a novel application of k-medoids clustering to RFID-derived in-store path data, addressing spatial constraints and path-length heterogeneity. By stratifying trips by time and selecting clusters via scree and KL criteria, the authors identify 14 canonical path types that summarize supermarket travel behavior. The analysis dispels myths about aisle-by-aisle shopping, underscores the racetrack’s role, and highlights selective aisle engagement and end-cap importance. Cross-validation confirms stability. Future research directions include: linking travel patterns to purchases and promotion exposure; developing formal dynamic models of blink-to-blink movement with state dependence and heterogeneity; expanding to different store formats and layouts; and integrating environmental/psychological drivers of movement.

Limitations
  • Measurement proxy: Paths track carts, not shoppers directly; when carts are stationary, shopper location is inferred only generally.
  • Temporal granularity: 5-second sampling can create apparent shelf crossings when rounding corners; path percentile standardization removes absolute timing within the path (addressed partially by time-group stratification).
  • Single-store context: Results derive from one western U.S. supermarket; generalization requires caution and replication across layouts/formats.
  • Sample selection: Analytic subset (8751) from ~27,000 paths; paths >2 hours excluded (e.g., abandoned carts). Some hardware malfunctions led to deletions.
  • Interpretive limits: Zone profiles can mislead without sequence; causes of behaviors (e.g., long checkout time) cannot be inferred from movement alone; no direct linkage to purchase or promotion exposure in this analysis.
  • Algorithmic: k-medoids yields local minima; mitigated via multiple random starts but not guaranteed globally optimal.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny