PromptCompressor: Enjoy 25% Savings on GPT API Without Losing Quality!

Introducing PromptCompressor, a web service that offers a cost-effective way to use language models like ChatGPT. PromptCompressor employs a deep learning model trained through reinforcement learning to reduce the length of the original prompts by approximately 20-30% while maintaining their performance.

Getting Started with PromptCompressor

Enter your prompts into the text box at promptcompressor.com. Copy the compressed prompt and input it into ChatGPT.

The compressed prompts may contain many grammatical errors but still work effectively. Sometimes, they even outperform!

PromptCompressor has been trained on the following prompt format. For stable performance, please write your prompts according to this format:

Instruction: <Write the instruction here. For example, 'Summarize the key points of the text.'>

Input: <Here, enter the sentence or data to be processed according to the instruction.

Maximize Your Tokens, Minimize Your Costs

If you are providing a service using the GPT API or using it personally, you might have encountered issues with API costs or the constraints of the context window. The current pricing policy and maximum context size of GPT are as follows:

	Model	Input	Output
GPT-3.5-turbo	4K context	$0.0015 / 1K tokens	$0.002 / 1K tokens
	16K context	$0.003 / 1K tokens	$0.004 / 1K tokens
GPT-4	8K context	$0.03 / 1K tokens	$0.06 / 1K tokens
	32K context	$0.06 / 1K tokens	$0.12 / 1K tokens

In simple terms:

Tokens are the smallest units that need to be processed when analyzing or generating text. In English, a word, symbol, or space can be a token. Typically, 1,000 tokens can represent about 750 words.
Context is the maximum number of tokens that an AI model can process at once. For example, a 4K context means the model can process up to 4,000 tokens at a time.
The total cost is calculated by multiplying the cost per token for both input and output tokens.

Therefore, PromptCompressor is suitable for tasks like summarization, prediction, classification, question answering, where the input tokens are more than the output tokens.

How Does It Work?

Humans can understand the meaning of a sentence even if some words are missing, such as articles like "a" and "the", prepositions like "in" and "of", etc. GPT, trained on human language, works similarly. We used this idea to train a model that removes tokens that have less impact on GPT's output, reducing the length of the prompt while retaining the meaning and context.

Evaluating the Performance

We conducted experiments with over 100,000 instructions on the model Llama-2-7b-chat-hf. We confirmed that it maintains about 95% of the performance while achieving an average compression effect of about 1/4.

PromptCompressor generates the optimal prompts considering the context. Therefore, the compression rate can vary depending on the task. For instance, compression may not be performed in cases where there are hardly any unnecessary tokens to eliminate.

What’s Next

Currently, PromptCompressor has a token limit of 507 due to the limitations of the backbone model. We aim to increase this limit and are continuously researching and developing to improve the service.

Contact

If you have questions or feedback about the service, please contact [email protected]. Your opinions will be a great help in improving the service.