The monitoring and learning layer for long-running agents

BentoLabs is the monitoring and learning layer for long-running agents. We detect when agents silently fail or drift from the user's goal, system prompt, or tool contracts, show affected users and root cause, and suggest the prompt, skill, or harness fix. As more teams deploy agents, keeping them reliable in production becomes mission-critical. Bento sits directly in the production loop and gives teams the operational leverage required to scale agent ecosystems without scaling human firefighting alongside them. The result is a system that turns opaque agents into agents that can be monitored, debugged, and improved continuously. The founders learned this problem at Emergent (YC S24), where they built and operated production coding agents used by 5M+ users. Abhinav was hire #1 and helped Emergent hit SWE-Bench #1 and scale from $0 to $100M ARR in just 8 months. Kaushik was hire #2, led full-stack engineering at Emergent, and was key to building the infrastructure that made production agents reliable, observable, and debuggable. Bento's self-learning engine has also lifted ARC-AGI-3 (internal) by 2.6x and Terminal-Bench 2.0 (internal) from 42.2% to 52.4% pass@1 with the same model, tools, and budget.

Active Founders

Kaushik ASP

Founder

Engineer, founder, and builder. I've taken products from 0→1 across AI, SaaS, and analytics. Most recently, I was a Founding Engineer at Emergent Labs (YC24), where I helped scale the product from launch to $100M ARR in under 8 months — one of the fastest-growing AI startups in India. Before that, I co-founded Sporty (metaverse) and TheProductArtists, and led engineering at Paz Care. I've been writing code professionally since 2017.

Kaushik ASP

Founder

Engineer, founder, and builder. I've taken products from 0→1 across AI, SaaS, and analytics. Most recently, I was a Founding Engineer at Emergent Labs (YC24), where I helped scale the product from launch to $100M ARR in under 8 months — one of the fastest-growing AI startups in India. Before that, I co-founded Sporty (metaverse) and TheProductArtists, and led engineering at Paz Care. I've been writing code professionally since 2017.

Abhinav Soni

Founder

Previously- Hire #1 at Emergent (YC S24). Abhinav led the Agents team, helping scale from $0 to $100M ARR in just 8 months and hit #1 on SWE-Bench, twice. Unofficially he was called ‘the agent whisperer’. He built BentoLabs after realizing that for Agents whatever he couldn't see it wouldn't get fixed. And the current monitoring tools were not delivering the value they promised. So he built the layer that actually finds the silent failure, fixes them and closes the loop.

Abhinav Soni

Founder

Previously- Hire #1 at Emergent (YC S24). Abhinav led the Agents team, helping scale from $0 to $100M ARR in just 8 months and hit #1 on SWE-Bench, twice. Unofficially he was called ‘the agent whisperer’. He built BentoLabs after realizing that for Agents whatever he couldn't see it wouldn't get fixed. And the current monitoring tools were not delivering the value they promised. So he built the layer that actually finds the silent failure, fixes them and closes the loop.

Company Launches

BentoLabs AI: Monitoring and Learning layer for long-running agents

See original launch post

BentoLabs AI: the monitoring and learning layer for long-running agents

TL;DR: BentoLabs AI is the monitoring and learning layer for long-running agents. We detect when agents silently fail or drift from the user's goal, system prompt, or tool contracts, show affected users and root cause, and suggest the prompt, skill, or harness fix. Run 101 is measurably better than run 1.

https://youtu.be/nu6Oir3OGSM\

Hello everyone, we're Abhinav and Kaushik, co-founders of BentoLabs AI.

After spending two years at Emergent (YC S24), building and operating production coding agents used by 5M+ users, helping scale from 0 to $100M ARR and topping SWE-Bench twice. We realised as more teams deploy agents, keeping them reliable, observable, debuggable and continuously improving in production becomes mission-critical. Most teams don't have a system for it. They have engineers.

The Problem:

Most production agents fail silently. You have 10,000 traces a day and zero visibility into reasoning drift until a support ticket pops up. That's the easier half! The harder problem is that nothing your agent figures out on one run carries into the next. An agent might solve a complex edge case on run 47, but because nothing carries forward, it burns your budget rediscovering the same fix on run 48.

This makes production teams spend hours reading logs in one tab just to manually patch prompts in another, on a loop that never closes.

The Solution:

BentoLabs AI The monitoring and learning layer for long-running agents.

Monitoring: BentoLabs finds every failure instance across your production traces, classifies it, and tracks it before the support ticket pops up.

uploaded image

Learning: BentoLabs captures what your agent figures out on every run and makes sure it carries into the next one. Recurring failures get fixed once and stay fixed. Hard-won solutions become reusable. Run 101 starts from everything runs 1 through 100 learned, not from zero.

uploaded image

The Proof:

Terminal-Bench 2.0 (Internal Run): we validated our recursive learning engine on Terminal-Bench 2.0, one of the most demanding agentic-shell benchmarks in the field today. Where the official Claude Sonnet 4.5 baseline scores 42.2% pass@1. Our agent scored 52.4%, with the same agent, same model, same budget.A +10.2 percentage-point lift, statistically significant (p < 0.05), with 13 tasks showing wins ≥ +20 pp and only 3 showing losses (Deep Dive)

uploaded image

ARC-AGI-3 (Internal Run): To see our engine in action, we took on ARC-AGI-3 (25 interactive puzzle games, the hardest agent benchmark we could find). While frontier models score 0.2–0.3% out of the box, 3 games the agent had never solved across ~30 prior runs were cracked for the first time. (Deep Dive)

uploaded image

Why we're building BentoLabs

We were the engineers who were the fix for the last 2 years at Emergent, where we spent thousands of hours staring at traces, and watching teams burn hours of their best engineers' time on patching prompts, correcting tool definitions and experimenting with different skills, hoping something will stick. We are building BentoLabs AI to move past this repetitive cycle.

We're already working with teams at unicorn scale. Every conversation with a team running agents in production confirms the same thing: the problem is universal. BentoLabs gives them the operational leverage to scale their agent ecosystems without scaling human firefighting alongside them.

If your engineers are doing the logs-and-patches rotation or your agents keep hitting the same failures, let's have a chat. Or Email: abhinav@bentolabs.ai/ kaushik@bentolabs.ai

uploaded image

YC Photos