paint-brush
The Science of "Cherry" Parameters: Why Some LLM Weights Matter Moreby@disproportionate
New Story

The Science of "Cherry" Parameters: Why Some LLM Weights Matter More

tldt arrow

Too Long; Didn't Read

A tiny fraction of LLM parameters—called "cherry" parameters—play a crucial role in model accuracy. Learn how identifying and preserving them can improve AI efficiency.

Company Mentioned

Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - The Science of "Cherry" Parameters: Why Some LLM Weights Matter More
Disproportionate Techstack  HackerNoon profile picture
0-item

Authors:

(1) Wanyun Cui, Shanghai University of Finance and Economics, with equal contribution;

(2) Qianle Wang, Shanghai University of Finance and Economics, with equal contribution.

Abstract and 1 Introduction

2 Related Work

3 Quantifying the Impact of Parameters on Model Performance & 4. Unified Mixed-Precision Training

5 Prevalence of Parameter Heterogeneity in LLMs

6 Quantization Experiments and 6.1 Implementation Details

6.2 Effect of Base LLM Quantization

6.3 Effect of Chat LLM Quantization

6.4 Comparison of Parameter Selection Criteria, Conclusion, & References

5. Prevalence of Parameter Heterogeneity in LLMs

While Figure 1 showcases the heterogeneity of selected parameter matrices in different LLMs, it is crucial to investigate whether this phenomenon is pervasive across the hundreds of parameter matrices within each LLM. In this section, we conduct a comprehensive analysis of parameter heterogeneity from a macro perspective.


To quantify the degree of heterogeneity in a parameter matrix, we introduce the heterogeneity score of the matrix. Inspired by the observation in Figure 1, where a small subset of parameters exhibits significantly higher impacts compared to the maximum of the majority, we define the heterogeneity score as the ratio of the mean impact of the top 1% parameters to the maximum impact of the bottom 99% parameters, as shown in Equation (4). A higher heterogeneity score indicates a more pronounced disparity in parameter importance within the matrix.



For comparison, we also include the heterogeneity scores based on the magnitude of parameters, a commonly used measure of parameter importance [11]. The magnitude-based heterogeneity score is calculated using Equation (5).



To provide a comprehensive view of parameter heterogeneity across different matrices, we plot the scatter distribution of heterogeneity scores for all parameter matrices of each model in Figure 2. It clearly reveals that the parameter matrices across different LLMs exhibit high heterogeneity scores, especially when comparing with parameter magnitudes. This finding strongly suggests that parameter heterogeneity is not an isolated occurrence but rather a widespread phenomenon in LLMs.


The pervasiveness of parameter heterogeneity highlights the need for quantization strategies that can effectively handle the disparate importance of parameters, ensuring that the cherry parameters are preserved with higher precision while allowing for more aggressive quantization of the less influential normal parameters.


Figure 2: Scatter distribution of heterogeneity scores for different parameter matrices in LLMs. Each point represents a parameter matrix, with the x-axis indicates the matrix index and the y-axis showing the heterogeneity score.


This paper is available on arxiv under CC BY 4.0 DEED license.