Weavel automates prompt and LLM engineering, delivering best prompts and algorithms 50x times faster. We do this by using LLMs and search algorithms to replace the manual trial-and-error of prompt engineering, but more efficiently - it is 3 times faster than the leading open source project.
Andrew is co-founder and CEO of Weavel. Andrew previously led product + engineering teams for two years, and has experience in building various applications in AI, VR, web, and mobile. Prior to that, Andrew studied electrical & computer engineering at Seoul National University before taking leave of absence in his junior year to focus on building things.
HyunJie is co-founder of Weavel, specializing in growth and data analytics. She studied data science and media at UC Berkeley and has experience working at Liner, Chartmetric, and DevRev, where she focused on marketing and data analysis.
Jun is co-founder of Weavel. Jun previously worked as researcher at NLP-focused AI lab and has experience in building various applications with LLM. Prior to that, Jun studied computer science & engineering at Seoul National University.
You’re an engineer of an LLM app, trying to get the prompts just right. Every time you type something in, the output changes—so you tweak a word here and there, and it changes again. Sometimes the outputs looks better, sometimes not. But you’re never sure. Hours go by, all spent on prompt engineering.
Getting the outputs you want can feel like an endless game of trial and error. And you’re not alone. Over the past few weeks, we’ve talked to over 100 YC companies, and a lot of them are facing the same challenges:
We solve the problem with one simple formula:
good input + right guidance = better prompts
Today, we launch Ape, your first AI Prompt Engineer. Inspired by DSPy, Reflexion, Expel and other research papers, Ape iteratively improves your prompts. Here’s how Ape works:
1️⃣ Log your inputs and outputs to Weavel (with a single line of code!)
2️⃣ Let Ape filter the logs into datasets.
3️⃣ Ape then generates evaluation code and uses LLMs as judges for complex tasks.
4️⃣ As more production data is added, Ape continues refining and improving prompt performance.
Create a Dataset
Change just one line of code to start logging LLM calls with the Weavel Python SDK. The SDK supports sync/async OpenAI chat completions and OpenAI structured outputs.
You can also import existing data or manually create a dataset.
Create a Prompt
Write a prompt that corresponds to your dataset. You can add an existing prompt as the base version, or if you prefer, create a blank prompt and provide a brief description for Ape to create a prompt from scratch.
Optimize Prompts
To optimize your prompt using Ape, fill in the necessary information (e.g. JSON schema as you want) and then run the optimization process. An enhanced version of your prompt will be created and available soon.
Ta-da! It’s that easy. Ape outperforms with a remarkable 94.5% score on the GSM8K benchmark, surpassing Vanilla (54.5%), CoT (87.5%) and DSPy (90.0%). With Ape, you can optimize the prompt engineering process, saving tons of time and cost while increasing performance.
Ape is open source. Check out our repository on GitHub. (We’d appreciate a star 🌟)
From left to right: Jun, Andrew, HyunJie, and Toby — together we’re building Weavel.
Andrew and Jun built 10+ LLM-based products, open-sourced a prompt engineering platform, and co-authored a paper at a NeurIPS workshop last year. HyunJie worked on data analytics and optimization at Chartmetric and DevRev, and focused on growth marketing at Liner. Then Toby joined, a full-stack engineer who worked at several early stage teams, shipping 5+ products.