HomeCompaniesConfident AI

The Leading LLM Evaluation Platform

Confident AI allows companies of all sizes to benchmark, safeguard, and improve LLM applications, with best-in-class metrics and guardrails powered by DeepEval. Built by the creators of DeepEval (4.3k stars, >400k monthly downloads), Confident AI is able to offer battle-tested, open-source evaluation algorithms while providing the infrastructure needed for teams to stay confident their LLM systems.
Confident AI
Founded:2024
Team Size:2
Location:San Francisco
Group Partner:Tom Blomfield

Active Founders

Jeffrey Ip, CEO & Cofounder

Creator of DeepEval, the open-source LLM evaluation framework. and grew it to over 400k monthly downloads and counting. Previously SWE @ Google, Microsoft.

Kritin Vongthongsri, Co-Founder

Building the #1 all-in-one LLM Evaluation Platform & empowering teams to red-team and safeguard LLM apps. Previously AI/ML @ Princeton, researching autonomous driving systems.
Kritin Vongthongsri
Kritin Vongthongsri
Confident AI

Selected answers from Confident AI's original YC application for the W25 Batch

Describe what your company does in 50 characters or less.

LLM Evaluation Platform for LLM Practitioners

What is your company going to make? Please describe your product and what it does or will do.

We are building an open-source LLM evaluation framework (DeepEval) for LLM practitioners to unit-test LLM applications. When used in conjunction with our evaluation platform (Confident AI), we provide insights on the best parameters (e.g. model, prompt-template) to use, a centralized place for teams to collaborate on evaluation datasets, and real-time performance tracking for LLM applications in production.

Without Confident AI, companies would have to build their own framework to automate LLM testing in CI/CD to prevent unnoticed breaking changes, have no visibility in which parameters gives the best performing results, pass evaluation datasets around through email or slack between teams to discuss failing test cases, unable to pinpoint how LLM performance relates to top-line business KPIs, and hire expert human evaluators to evaluate sampled LLM responses in production.