The Perils of Third Party Data in Forecasting
Many have touted the inclusion of data from outside the firm as a way to improve forecast accuracy. The idea is that, with the incorporation of more information to explain the demand signal, forecasts are bound to be better. In principle this is true, and it has logical appeal. In reality, the inclusion of third party is fraught with hazard.
Third party data includes macroeconomic, weather, digital, mobility, marketing, consumer, and so on. There is no shortage of it offered by a myriad of vendors ready to sell it. In fact, the first challenge is in integrating data with your own at the appropriate levels of hierarchy and aggregation.
But this is not where the danger lies. The main issue with outside data is that the forecasts of those data themselves may have a high degree of error. For nearly all forecasting methodologies, for every data element that is included in a forecasting model, future values of those elements need to be provided. To the extent that those future values themselves are inaccurate, their inaccuracy feeds into the main forecasting model. Forecast errors in the inputs can lead to forecast errors in the metric you are trying to forecast. Opening up a model to outside data provides an inlet to potentially make the forecasts worse.
We’ve seen this before. During the recession in 2009, one client subscribed to one of the major providers of macroeconomic data. They noticed that their own forecasts were getting worse, and attributed a large portion of the degradation to the forecasts they were getting from this vendor. The vendor’s forecasting model was too slow to pick up on changes in the economy and was over-forecasting. For a number of months this client built in a bias factor to adjust the inputs.
And even with accurate forecasts, it is worth considering what you are paying for, and what you get that actually works. We know of another company who subscribed to the same service. They would receive about 4600 data elements monthly. They had implemented a highly-automated forecasting system that would automatically select the best variables to fit models for thousands of time series. Of those 4600 variables, only 48 were ever selected, and of those, only 8 were used consistently.
Does all of this mean there is no value in third party data? No — it means you need to assess their value. How do they improve accuracy (add value) versus their cost to obtain? This can be done through a Forecast Value Added analysis (FVA).
FVA is applied to a step or participant in the forecasting process. It is usually focused on human touchpoints. Should this person be modifying the forecast? If they do, does it make it better or worse? And by how much? This same principle can be applied to a data input. Does inclusion of this input make the forecast better or worse?
A well-managed forecasting system measures and monitors its performance. There are two considerations when using third-party data. First, have an FVA-like system to measure the value delivered (improved accuracy, reduced bias) versus the cost to obtain and incorporate. Second, be prepared to be comfortable with very simple models. Time after time, forecasting studies and competitions have shown that simpler models, such as seasonal weighted moving averages, can beat out more complex ones. These use no inputs at all. This is sometimes hard for people to swallow, but if you are exclusively focused on better forecasts, go with the best approach that gets you there, regardless of complexity, and possibly by omitting the input data.
Check out some of these case studies for more ideas on how to improve your forecasting system.