The forecasting system that governed inventory decisions across this retailer's 1,800-store network had been built in 2011. At the time, it was a state-of-the-art statistical ensemble using Holt-Winters exponential smoothing with seasonal decomposition and manual override layers managed by regional merchandising teams. It worked well for a decade.
By 2024, the world the system was built for no longer existed. Same-day delivery had compressed consumer patience for out-of-stock events from days to hours. Social media demand spikes could move a product from average velocity to viral in 48 hours, and the old system had no mechanism to detect, let alone respond to, this signal. The retailer operated 14 private label brands alongside national brands, and the old system had separate model instances for each brand that had never been unified into a consistent architecture.
The measurable consequences were significant. Overstock averaged 31% above optimal across the network, representing $840M in excess inventory carrying cost annually. Out-of-stock events during peak demand periods (holidays, back-to-school, weather events) generated an estimated $200M in lost sales annually. Markdown losses from excess inventory clearance were running at $160M per year. The combined opportunity cost of maintaining the legacy system was approximately $1.1B per year.
The reason the system had survived this long was organizational, not technical. The legacy forecasting vendor had deep relationships with the merchandise planning and supply chain leadership teams. The previous CEO had made a strategic decision to treat inventory optimization as an operational capability, not a competitive differentiator, and had consistently deprioritized investment in forecast modernization. That leadership had since changed, and the new CDO arrived with a mandate to close the capability gap.
The fundamental challenge of demand forecasting at this scale is that 2.4 million SKUs across 1,800 stores means 4.32 billion store-SKU pairs requiring daily replenishment signals. No model architecture can forecast each pair independently at that scale. The practical question is how to build a model hierarchy that captures the patterns at the appropriate level of granularity without losing the signal that exists at the individual SKU-store level.
The second challenge was data heterogeneity. The retailer's data infrastructure had been built through 22 years of acquisitions, resulting in 7 distinct ERP systems, 4 different point-of-sale platforms, and 3 separate inventory management systems. A significant component of this engagement was data harmonization before any model development could begin.
The third challenge was the transition risk. The legacy system was generating $42 billion in annual purchasing decisions. A forecasting error that created either a significant overstock position or a supply shortage during peak season would cost far more than the engagement fee. The transition to the new system required a parallel operation period and a carefully staged cutover that maintained the business's ability to revert to the legacy system if problems emerged.
Finally, the retailer had 340 regional merchandising buyers who had spent years developing manual override intuitions on top of the legacy system. These buyers had legitimate expertise about local market dynamics, vendor relationship dynamics, and promotional sensitivities that the old statistical model systematically ignored. The new system needed to incorporate this human expertise rather than fight it.
We designed a three-level hierarchical forecasting architecture. At the national level, we trained LightGBM models for each of 2,400 product categories, incorporating macroeconomic signals, national promotional calendars, and social media demand trend detection. At the regional level, we trained location-cluster models grouping stores by demographic similarity, local market density, and historical demand pattern similarity, producing 180 regional cluster models. At the individual SKU-store level, we used model-generated base forecasts from the regional clusters, adjusted by real-time signals including recent velocity trends, local weather, and store-specific inventory position.
The social media signal integration was one of the most technically differentiated components. We built a real-time NLP pipeline processing mentions across TikTok, Instagram, and Pinterest to detect emerging demand trends for product categories before they surfaced in sales velocity data. This pipeline produced a "trend acceleration signal" that was fed as a feature to the national-level category models, giving the system early warning on viral demand events with an average 4-day lead time over the legacy system's ability to detect the same events.
Six weeks of the 18-week engagement were dedicated entirely to data harmonization. This included building a canonical product hierarchy mapping 2.4 million SKUs across the 7 ERP systems to a unified product taxonomy, reconciling historical sales records across the 4 POS platforms, and building a data quality monitoring pipeline that flagged anomalies in incoming sales data before they contaminated model training. This was unglamorous work. It was also the work that made everything else possible.
The 340 regional merchandise buyers represented institutional knowledge that no dataset fully captured. Local weather patterns, regional event calendars, specific vendor reliability patterns, and local competitive dynamics all informed buyer decisions in ways that were not systematically recorded. We designed an override interface that allowed buyers to apply adjustments to the model's forecasts, and critically, we tracked override outcomes to create a feedback loop that incorporated validated buyer intuitions into future model training.
After 6 months of operation, we analyzed the override data. Buyer overrides that increased the model forecast were accurate 71% of the time. Buyer overrides that decreased the model forecast were accurate 58% of the time. This data allowed us to build a "buyer confidence weighting" into the architecture, giving more weight to buyer increases (particularly for specific product categories where individual buyers had demonstrated consistent accuracy) than buyer decreases.
The new system ran in parallel with the legacy system for 4 weeks before any live inventory decisions were assigned to it. During parallel operation, we tracked the divergence between the two systems' forecasts and manually reviewed the top 200 divergent cases each week with the merchandising and supply chain teams. This review process identified 12 edge cases where the new system's forecasts were systematically biased for specific product categories and allowed us to correct those biases before live cutover.
The cutover was staged by product category over 6 weeks, starting with the lowest-risk categories (stable velocity, long shelf life, high inventory flexibility) and progressing to the highest-risk categories (perishable, seasonal, promotional-dependent). At no point was the legacy system fully decommissioned until the new system had proven consistent performance across all category types over a full 4-week period.
Unified product taxonomy across 7 ERP systems. Historical sales reconciliation across 4 POS platforms. Data quality monitoring pipeline. Three-level forecasting architecture design and buy-in from merchandising leadership.
National category models (2,400 LightGBM models) trained. Regional cluster models (180 clusters) built. Social media demand trend NLP pipeline deployed. Buyer override interface designed and validated with pilot buyer group of 24.
New system runs alongside legacy system with no live inventory decisions. Top 200 divergence cases reviewed weekly with merchandising and supply chain teams. 12 category-level biases identified and corrected. Performance validated across full SKU range.
6-week staged cutover by product risk category. Low-risk categories live in week 14. High-risk seasonal categories live in week 18. Legacy system maintained in standby throughout. Full buyer override training completed across all 340 buyers.
"We had known for years that our forecasting system was holding us back. What we did not know was how to replace something that touched $42 billion in annual purchasing decisions without catastrophic transition risk. The staged cutover approach was elegant. At no point were we exposed to a failure we could not recover from. The $140M revenue impact in the first year exceeded our most optimistic business case projections by 40%."
Most retail forecasting systems were designed before social media demand signals, same-day delivery expectations, and the data infrastructure to process them at scale. Tell us about your current capability and we will show you where the gap is costing you the most.
Tell us about your forecasting program and we will follow up within 1 business day.