World Model for Browser Agents
Browser agents are broken—and whoever fixes them will shape the next decade of software.
Today, even the best browser agents from labs like OpenAI, Anthropic, and Google fail over 80% of real-world tasks, often taking three times as long as humans to complete simple actions. Foundry is addressing this by building the first robust simulator, RL training environment, and evaluation platform designed specifically for browser agents. Historically, simulation environments and standardized benchmarks were critical in advancing self-driving cars (e.g., Waymo Sim, KITTI) and LLMs (e.g., HELM, MMLU). We're applying this proven method to browser automation, enabling accurate benchmarking, rapid iteration, and real-world reliability.
For example, OpenAI could use Foundry to build a perfect replica of DoorDash's website, enabling them to run millions of ordering tests without ever touching real-world complexities like CAPTCHAs, payments, or anti-bot measures. This approach lets them clearly pinpoint why agents fail, rapidly iterate, and dramatically speed up agent improvements. Our mission is simple but ambitious: transform browser agents from unstable research projects into robust solutions enterprises can trust.
We’re a technically rigorous team of ML practitioners from Scale AI, committed to impactful engineering and groundbreaking products. If you're an exceptional engineer eager to tackle real-world challenges at the cutting edge of AI, ML, and RL, Foundry is your next opportunity.
70% of software runs in browsers, but APIs cover only a fraction of web interactions. Reliable browser agents represent an enormous automation opportunity capable of reshaping industries. By solving reliability, Foundry positions itself at the center of this transformative shift.
As a Founding Fullstack Engineer, you'll build critical systems and user experiences powering Foundry’s web simulation and evaluation platform. You'll collaborate closely with ML and RL specialists, influencing key technical decisions and directly shaping our product’s future.
We’re building the first end-to-end evaluation and training platform for web agents. Our system enables teams to test, benchmark, and optimize browser automation models at scale.
By combining synthetic user simulations, automated evaluations, and large-scale benchmarking, we help teams build more reliable web agents that handle real-world environments with confidence.