Authors:
(1) Vijay Ekambaram, IBM Research;
(2) Arindam Jati, IBM Research;
(3) Nam H. Nguyen, IBM Research;
(4) Pankaj Dayama, IBM Research;
(5) Chandra Reddy, IBM Research;
(6) Wesley M. Gifford, IBM Research;
(7) Jayant Kalagnanam, IBM Research.
Editor's note: this is part 2 of 5 of a study detailing the development of a tiny, fast AI model that delivers excellent accuracy. Read the rest below.
3 TTM Workflows and 3.1 Pre-training Workflow
5 Conclusions and Future Work, and References
TTM follows a multi-level architecture consisting of four key components (see Figure 1(a)): (1) The TTM Backbone is assembled using building blocks derived from the efficient TSMixer architecture [Ekambaram et al., 2023]. TSMixer is based on simple MLP blocks that enable mixing of features within patches, across patches and channels, surpassing existing transformer-based TS approaches with minimal computational requirements. Since TSMixer is not targeted to handle multi-resolution data, we introduce various novel enhancements to it as explained later. (2) TTM Decoder follows the same backbone architecture but is considerably smaller in size, approximately 10-20% of the size of the backbone, (3) Forecast Head consists of a linear head designed to produce the forecast output, and (4) Optional Exogenous Mixer serves to fuse exogenous data into the model’s forecasting process. This multi-level model refactoring is required to dynamically change the working behavior of various components based on the workflow type, as explained in Section 3. In addition to the above primary components, we also have a preprocessing component as explained next.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.