Table of Links
3 End-to-End Adaptive Local Learning
3.1 Loss-Driven Mixture-of-Experts
3.2 Synchronized Learning via Adaptive Weight
4 Debiasing Experiments and 4.1 Experimental Setup
4.3 Ablation Study
4.4 Effect of the Adaptive Weight Module and 4.5 Hyper-parameter Study
6 Conclusion, Acknowledgements, and References
4.4 Effect of the Adaptive Weight Module
Last, we turn our attention to investigating the effect of the adaptive weight module, studying how it synchronizes the learning paces of different users. We run TALL on the ML1M dataset and present the average weights for the five subgroups with the gap window (#gap = 40) in Figure 3. It can be observed that the adaptive weight module assigns weights dynamically to different types of users to synchronize their learning paces. Initially, mainstream users receive higher weights because they are easier to learn and have a higher upper bound of performance than niche users. Then, when mainstream users reach the peak, the model switches the attention to niche users who are more difficult to learn, gradually increasing the weights for βlowβ, βmed-lowβ, and βmediumβ users until the end of the training procedure. However, βmed-highβ and βhighβ users, approaching converged, need a slower learning pace to avoid overfitting, leading to a decrease in the weights. Figure 3 illuminates the effectiveness and dynamic nature of the proposed adaptive weight module in synchronizing the learning procedures for different types of users.
4.5 Hyper-parameter Study
Additionally, we have also conducted a comprehensive hyper-parameter study investigating the impacts of three hyper-parameters in TALL: (1) the gap window in the adaptive weight module; (2) Ξ± in the adaptive weight module; and (3) the number of experts. The complete results are in https://github.com/JP-25/ end-To-end-Adaptive-Local-Leanring-TALL-/blob/main/Hyperparameter Study. pdf.
Authors:
(1) Jinhao Pan [0009 β0006 β1574 β6376], Texas A&M University, College Station, TX, USA;
(2) Ziwei Zhu [0000 β0002 β3990 β4774], George Mason University, Fairfax, VA, USA;
(3) Jianling Wang [0000 β0001 β9916 β0976], Texas A&M University, College Station, TX, USA;
(4) Allen Lin [0000 β0003 β0980 β4323], Texas A&M University, College Station, TX, USA;
(5) James Caverlee [0000 β0001 β8350 β8528]. Texas A&M University, College Station, TX, USA.
This paper is