Coval - Simulation & Evaluation for AI Agents

Ship AI agents faster with a simulation and evaluation platform for chat and voice assistants.

Brooke Hopkins

6 months ago

Teams are racing to market with AI agents, but slow manual testing processes are holding them back. Engineers currently spend hours manually evaluating and playing whack-a-mole just to discover that fixing one issue introduces another.

At Coval, we build automated simulation and evaluation for AI agents inspired by the autonomous vehicle industry to boost test coverage, speed up development, and validate consistent performance.

We have a waitlist, but YC companies go first! Grab some time here: https://bit.ly/coval-demo

Our Story

Hey! I’m Brooke, the founder of Coval. 👋

Before starting Coval, I led the evaluation job infrastructure team at Waymo. I coded the first versions of our dataset storage and other foundational simulation systems, and my team built all of the dev tools for launching and running evals.

Through my conversations with hundreds of engineering teams at startups and enterprises, I've seen that AI agents—models that operate independently and handle complex tasks—are facing similar challenges to those in self-driving.

In the early days, autonomous vehicle companies relied heavily on manual evaluation, testing the self-driving cars on racetracks and city streets (remember when autonomous cars still had safety drivers?). However, as startups scaled, a significant shift happened: we moved towards simulating every code change in a “virtual” environment, using the vast amounts of data we collected. The new approach dramatically improved vehicle behavior, leading to hundreds of autonomous cars zipping around the San Francisco streets today!

This story mirrors what's happening today with AI agents across various industries. Teams are coming up with promising prototypes but often hit a wall when it comes to their reliability.

As we build for the future, where AI agents execute much of our work, ranging from sending emails to prescribing medication, the risks posed by untested systems could severely throttle the progress.

At Waymo, I developed tools that tested each code modification made by engineers, ensuring that every change improved the Waymo Driver's performance. I believe this methodical approach was key in helping our team address edge cases and maintain peak performance, and it ultimately cemented Waymo's status as a leader in the autonomous vehicle space.

Now, at Coval, we’re taking these proven strategies and adapting them in a completely new way to speed up the development of AI agents. Our goal is to help engineers build agent experiences that genuinely work for users in the real world.

Automated simulation and evaluation are critical to trusting agents with impactful tasks across industries.

Working With Us

Building agents or know someone? Let’s talk! Grab time with me here for a quick intro, or message me at brooke@coval.dev.