September 2022

Our work with Adobe Research on the project of time series prediction bore fruit with the paper accepted into the ACM International Conference on Information and Knowledge Management (CIKM), to be held Oct 17-21.

Mustafa Abdallah (Purdue); Ryan Rossi, Kanak Mahadik, Sungchul Kim, Handong Zhao (Adobe Research); and Saurabh Bagchi (Purdue), “AutoForecast: Automatic Time-Series Forecasting Model Selection,” At the 31st ACM International Conference on Information and Knowledge Management, pp. 1–10, October 17-21, 2022, Atlanta, GA.

Acceptance rate: 274/1175 (23.2%)

This paper is accompanied with a large corpus of time series datasets including a trace dataset of Adobe’d cloud computing usage for 2 weeks. [ Repo ]

Mustafa Abdallah, PhD student (now faculty at IUPUI), the lead author of this work

Accurate time-series forecasting at scale is critical for a wide range of industrial domains such as cloud computing, supply chain, energy, and finance. Most of the current time-series forecasting solutions are built by experts and require significant manual effort in model construction, feature engineering, and hyper-parameter tuning. Hence, they do not scale to generate high-quality forecasts for a wide variety of applications. Moreover, there is no learning scheme that is uniformly better than all other learning schemes for all problem instances. For example, from our experiments, we find empirically that no single forecasting model triumphs in more than 0.7% of the datasets in our two training testbeds comprising 625 time series, i.e., there is no unique single model that works well on all datasets. A naïve approach would be, given a new dataset, we evaluate the performance of thousands of available models on the dataset to select the best forecasting model for the problem at hand. However, this approach is practically infeasible due to the untenable time burden for every new problem.

In this work, we formulate the problem of automatic and fast selection of the best time-series forecasting model as a meta-learning problem. Our solution avoids the infeasible burden of first training each of the models and then evaluating each one to select the best model for a new unseen time-series dataset, or even a new time window within a non-stationary dataset. A practically important desideratum for any solution to this problem is that once the meta-learner L is trained in an offline manner using a large corpus of time-series data, then we can use it to quickly infer the best forecasting model. The quick inference requirement of this new problem, makes it challenging to solve, yet practically important. Our meta-learner L is trained on the models’ performances on historical datasets and the time-series meta-features of these datasets.