Enabling Parallel and Nested Function Calls in Language Models: Dataset Requirements

Table of Links

Appendix

4.5 Parallel and nested function call

For the benchmark test above, we indicate that they are intended for the single function call. To enable the parallel function call and the nested function call, we need to prepare 4K data points for each API so that the accuracy can reach the same level as the single function call.

4.6 Weighted loss function for special tokens

A distinctive aspect of our approach involves incorporating numerous special tokens into the tokenizer and expanding the language model’s head. The loss function is defined as follows:

where T represents the sequence length, and V denotes the vocabulary size.

Given the introduction of special tokens ranging from to , along with the distinct token , which are absent in the Gemma-2B pretrained dataset, we confront an imbalanced dataset challenge during model training. To address this, we adopt a weighted cross-entropy loss as a surrogate loss to improve convergence:

In our configuration, non-special tokens are assigned a weight of 1, while special tokens receive elevated weights. Early-stage training experiments indicate that increasing token weight can expedite convergence. The validation loss, based on Equation (3) with varying surrogate losses for training, is illustrated in Figure (6). Our findings suggest that employing a surrogate training loss early in the training process aids convergence. Nonetheless, experiments reveal no performance disparity in the fine-tuned model nor significant differences in wall-clock time. Therefore, utilizing an equal weighted token loss is recommended for a small number of function tokens. In our benchmark tests, the evaluated model is trained by equal token weights.

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Authors:

(1) Wei Chen, Stanford University, with equal contribution and a corresponding author {weichen6}@stanford.edu;

(2) Zhiyuan Li, Stanford University and a corresponding author {zhiyuan8}@stanford.edu.

Enabling Parallel and Nested Function Calls in Language Models: Dataset Requirements

Too Long; Didn't Read

Company Mentioned

Table of Links

4.5 Parallel and nested function call

4.6 Weighted loss function for special tokens

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Enabling Parallel and Nested Function Calls in Language Models: Dataset Requirements

Too Long; Didn't Read

Company Mentioned

Table of Links

4.5 Parallel and nested function call

4.6 Weighted loss function for special tokens

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics