HyperTransformer: Conclusion and References

In this paper we propose a new few-shot learning approach that allows us to decouple the complexity of the task space from the complexity of individual tasks.
This paper is available on arxiv under CC 4.0 license.


Andrey Zhmoginov, Google Research

Mark Sandler, Google Research

Max Vladymyrov, Google Research


In this work, we proposed a HyperTransformer (HT), a novel transformer-based model that generates all weights of a CNN model directly from a few-shot support set. This approach allows us to use

a high-capacity model for encoding task-dependent variations in the weights of a smaller model. We demonstrate that generating the last logits layer alone, the transformer-based weight generator beats or matches performance of multiple traditional learning methods on several few shot benchmarks. More importantly, we showed that HT can be straightforwardly extended to handle unlabeled samples that might be present in the support set and our experiments demonstrate a considerable fewshot performance improvement in the presence of unlabeled data. Finally, we explore the impact of the transformer-encoded model diversity in CNN models of different sizes. We use HT to generate some or all convolutional kernels and biases and show that for sufficiently small models, adjusting all model parameters further improves their few-shot learning performance.


