Tl;dr: Proxis is the first dedicated platform for LLM distillation and serving, unlocking production ready models at 1/10th of the cost.
The Problem
- Fine-tuning frontier models like GPT-4 on proprietary data is expensive, and locks customers into one external closed source model provider.
- These large models are 100x the price per-token of smaller Llama 3.1 models, and run nearly 10x slower. This makes them impractical to deploy at scale.
- Closed source model lock-in means customers can’t tweak or tune their own model, or deploy on-prem for sensitive data solutions.
The Solution
Model Distillation:
- In model distillation, a large teacher model effectively fine tunes a smaller student model without the need for labeled or structured datasets. This ‘condenses’ large models down to the compute cost of small models.
- Distillation results in near-frontier model quality, with efficient performance. This means we can achieve 5x the speed at 1/10th the cost.
- With the release of Llama 405b in late July, open source frontier distillation became available to the public for the first time.
Growth & Market
- Llama model downloads have grown 10x in the last year and are approaching 350 million downloads to date. There were 20 million downloads in the last month alone, making Llama the leading open source model family.
- Llama usage by token volume across major cloud service providers has more than doubled in just three months from May through July 2024 when Llama 3.1 was released.
- Monthly usage (token volume) of Llama grew 10x from January to July 2024 for some of the largest cloud service providers.
Source: Meta
The Team
Jackson (CTO, on the left) optimized the Gemini model at Google for efficient deployment at massive scale. Liam (CEO, on the right) built zero-to-one systems as a software engineer.
The Ask
Give it a go! Sign up to our waitlist here to access Proxis-hosted Llama models at a cheaper cost than current offerings.