GPT-4-level function calling model, but with 3X faster response time and 10X lower cost
Empower-functions is a model that offers GPT-4-level function call capabilities, focusing on real-world use cases such as multi-turn and parallel calling, but with a 3 times faster response time and 10 times lower cost. Check out our doctor appointment booking bot live demo!
The full potential of Large Language Models (LLMs) is realized not only through conversations but also through their integration with external APIs, enabling them to perform actions such as interacting with internal systems for identity verification, booking appointments, and processing checkouts. The capability to call functions is critical to empower a wide range of real-world use cases, including workflow automation and support agent tasks.
Currently, the predominant solution involves using OpenAI's models, where users face a choice between GPT-4, which offers high response quality but is hindered by significant latency and high costs that limit its applicability in various use cases, and GPT-3.5, which, while faster and more affordable, is more likely to generate inaccurate responses. The demand for a more balanced solution, a model that offers higher response quality than GPT-3.5 with much better performance than GPT-4, reveals few alternatives. While the emergence of open-source software (OSS) models broadens possibilities and flexibility, none of the current major providers, such as Fireworks, Anyscale, or Together AI, adequately address this need in real-world use cases. For instance, they generally underperform in multi-turn interactions, and few support parallel calling.
Empower-functions is an LLM developed by empower.dev, focusing on the real-world function calling use case.
Below, we use a screenshot to showcase how the empower-functions model performs on a complex, multi-turn conversation that requires multiple function calls. For a more hands-on experience, please try our live demo.
Under the shell, the empower-functions model is fine-tuned based on the Mixtral-8X7B-Instruct model. We specifically collected data and tailored the model to support multi-turn conversations and to determine whether to trigger functions automatically. These efforts ensure the best performance in real-world use cases, which typically involve multi-turn conversations interleaved with function calls. Levering our proprietary inference engine, we have reduced the TTFT(time to first token) latency to under 400ms, a substantial improvement over GPT-4’s one-second latency. We are offering this model at a price point of $1.5 per million tokens.
To comprehensively assess the response quality of the model, we benchmarked it across three datasets (all of the datasets can be found here):
In the benchmark, we compared the model against other function-calling models, including GPT-4, GPT-3.5, firefunction, together, and Anyscale. For Together and Anyscale, we used mistralai/Mixtral-8x7B-Instruct-v0.1, as it represents their best offering. empower-functions consistently deliver superior performance in all scenarios, especially in the multi-turn dataset and the parallel-calling dataset, which are closer to real-world use cases.
We have made the model generally available on our platform today. You can experiment with our live demo for a hands-on experience with the model in a real-world use case. To use the model in your project, simply sign up for an account and obtain an API key. We also provide free credits for your trial journey— see our quick start guide.
The completion API we provide is fully compatible with the OpenAI API, allowing you to use the empower-functions model as a drop-in replacement. More details can be found in our function calling documentation.