A New Tiny AI Model Is Outsmarting the Big Guys With a Four-Part Brain

Authors:

(1) Vijay Ekambaram, IBM Research;

(2) Arindam Jati, IBM Research;

(3) Nam H. Nguyen, IBM Research;

(4) Pankaj Dayama, IBM Research;

(5) Chandra Reddy, IBM Research;

(6) Wesley M. Gifford, IBM Research;

(7) Jayant Kalagnanam, IBM Research.

Editor's note: this is part 2 of 5 of a study detailing the development of a tiny, fast AI model that delivers excellent accuracy. Read the rest below.

Table of Links

Abstract and 1. Introduction
2 TTM Components
3 TTM Workflows and 3.1 Pre-training Workflow
- 3.2 Fine-tuning Workflow
4 Experiments and Results
- 4.1 Experimental Setting
- 4.2 TTM’s Zero/Few-shot Performance
- 4.3 Computational Benefits of TTM and 4.4 TTM’s effectiveness in cross-channel and exogenous modeling
- 4.5 Ablation Studies and 4.6 Comparison with recent concurrent works
5 Conclusions and Future Work, and References
- Appendix A

2 TTM Components

2.1 Multi-level Modeling

TTM follows a multi-level architecture consisting of four key components (see Figure 1(a)): (1) The TTM Backbone is assembled using building blocks derived from the efficient TSMixer architecture [Ekambaram et al., 2023]. TSMixer is based on simple MLP blocks that enable mixing of features within patches, across patches and channels, surpassing existing transformer-based TS approaches with minimal computational requirements. Since TSMixer is not targeted to handle multi-resolution data, we introduce various novel enhancements to it as explained later. (2) TTM Decoder follows the same backbone architecture but is considerably smaller in size, approximately 10-20% of the size of the backbone, (3) Forecast Head consists of a linear head designed to produce the forecast output, and (4) Optional Exogenous Mixer serves to fuse exogenous data into the model’s forecasting process. This multi-level model refactoring is required to dynamically change the working behavior of various components based on the workflow type, as explained in Section 3. In addition to the above primary components, we also have a preprocessing component as explained next.

2.2 Pre-processing

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

A New Tiny AI Model Is Outsmarting the Big Guys With a Four-Part Brain

Too Long; Didn't Read

Companies Mentioned

Table of Links

2 TTM Components

2.1 Multi-level Modeling

2.2 Pre-processing

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

A New Tiny AI Model Is Outsmarting the Big Guys With a Four-Part Brain

Too Long; Didn't Read

Companies Mentioned

Table of Links

2 TTM Components

2.1 Multi-level Modeling

2.2 Pre-processing

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics