New Story

Optimizing Function Calling Models: The Role of Dataset Size and LoRA Fine-tuning

by Language Models (dot tech)April 8th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

LoRA plays a crucial role in our framework, particularly when integrating the Octopus model across multiple applications to ensure smooth computation.

Company Mentioned

Mention Thumbnail
featured image - Optimizing Function Calling Models: The Role of Dataset Size and LoRA Fine-tuning
Language Models (dot tech) HackerNoon profile picture
0-item

Abstract and 1. Introduction

2 Related works

3 Methodology and 3.1 Causal language model as a classification model

3.2 Functional token

3.3 Dataset collection

3.4 Model development and training

4 Experiments and 4.1 Android function calls

4.2 Extension to Vehicle, Yelp, and DoorDash function sets

4.3 Full and partial training datasets and 4.4 Full training and LoRA training

4.5 Parallel and nested function call and 4.6 Weighted loss function for special tokens

5 Discussion and future works and References


Appendix

A.1 Android function examples

A.2 Vehicle function examples

4.3 Full and partial training datasets

The Octopus model demonstrates exceptional performance with 1,000 data points sampled for each API during its training phase. However, for training a new set of functions, cost efficiency becomes a consideration, given the need to generate a training dataset. In our analysis, generating 1,000 data points for a single API incurs a cost of 0.0224 USD, representing the investment required to train an Octopus-0 model for one specific function. By evaluating the Octopus-0, Octopus-2, and Octopus-3 models, we discern that sampling only 100 data points for one API can still achieve an accuracy of 98.095%, as depicted in Figure (4). Therefore, for individuals seeking to train their own Octopus model using our framework, we recommend a dataset size ranging from 100 to 1,000 data points.

4.4 Full training and LoRA training

LoRA plays a crucial role in our framework, particularly when integrating the Octopus model across multiple applications to ensure smooth computation. Instead of employing full models for each API set, we opt for diverse LoRA trainings tailored to the specific function setups of different apps. As Figure (4) illustrates, switching to Lora training results in a minor accuracy decrease. Nonetheless, the maintained high accuracy levels are sufficiently robust for production deployment.


Table 1: Configuration of the four different octopus models.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Authors:

(1) Wei Chen, Stanford University, with equal contribution and a corresponding author {weichen6}@stanford.edu;

(2) Zhiyuan Li, Stanford University and a corresponding author {zhiyuan8}@stanford.edu.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks