GIFT-Eval: A Benchmark for General Time Series Forecasting Model Evaluation
Abstract
The development of time series foundation models has been constrained by the absence of comprehensive benchmarks. This paper introduces the General TIme Series ForecasTing Model Evaluation, GIFT-Eval, a pioneering benchmark specifically designed to address this gap. GIFT-Eval encompasses 28 datasets with over 144,000 time series and 157 million observations, spanning seven domains and featuring a variety of frequencies, number of variates and prediction lengths from short to long-term forecasts. Our benchmark facilitates the effective pretraining and evaluation of foundation models. We present a detailed analysis of 12 baseline models, including statistical, deep learning, and foundation models. We further provide a fine-grained analysis for each model across different characteristics of our benchmark. We hope that insights gleaned from this analysis along with the access to this new standard zero-shot time series forecasting benchmark shall guide future developments in time series forecasting foundation models.