The developer platform for building production-worthy Large Language Model applications
TL;DR Vellum (W23) is a developer platform for building production-worthy applications on LLMs like @OpenAI’s GPT-3 or @Anthropic’s Claude. Use Vellum to save hours on prompt engineering, iterate on prompts in production confidently, and continuously fine-tune for better results (we helped one customer save 94% of their LLM costs through fine-tuning!). Request early access here.
—
Hi everyone,
Akash, Noa and Sidd here. We worked together at Dover (YC S19) for 2+ years where we built production use-cases of Large Language Models (LLMs). Noa and Sidd are MIT engineers who previously worked at DataRobot’s MLOps team and Quora’s ML Platform team respectively.
We decided to work on Vellum after realizing that while the MLOps industry has matured rapidly for traditional ML, companies using LLMs don’t have any of this tooling available. Engineering teams spend many hours building custom internal tooling to tame LLMs, taking away time from building their core product.
We’ve seen companies building production applications with Large Language Models experience challenges at all stages:
We have 3 products – each aimed to solve the problems we identified when building production LLM apps ourselves.
Create one “sandbox” per LLM use-case. In each, you can try as many prompt variants as you like across as many test cases as you wish. These prompt variants may differ in their text, underlying model, model parameters (e.g. “temperature”), and even LLM provider! Each run is saved as a history item and has a permanent url you can use share with teammates and track results.
Once you’ve settled on a prompt you like, you “deploy” it through Vellum. Vellum acts as a high-reliability, low-latency proxy layer between you and LLM providers. Every request is captured and persisted in one-place, providing observability into your model’s performance. Because the API interface doesn’t change, you can update your prompt (and even the underlying LLM provider!) without making any code changes. Previously made requests can be replayed against proposed changes to gain confidence prior to updating a deployment. All updates are version-controlled and you can revert to prior versions at any time.
Once you’ve collected data in production for some time through Vellum Manage, these input/output pairs (minimum 100, but depends on the use case), can be used to fine-tune your own proprietary models for better quality, lower cost, or lower latency. Your data and your models become a powerful competitive moat. Vellum periodically runs model evaluation jobs in the background to see if we can find a different model that works even better for your use case. If one is identified, you can swap models under-the-hood – no code changes needed!
P.S: If you’re curious, we recently published a blog where you can learn more about the benefits of fine-tuning and how to do it. Check it out here.