Browser agents are broken—and whoever fixes them will shape the next decade of software.

Today, even the best browser agents from labs like OpenAI, Anthropic, and Google fail over 80% of real-world tasks, often taking three times as long as humans to complete simple actions. Foundry is addressing this by building the first robust simulator, RL training environment, and evaluation platform designed specifically for browser agents. Historically, simulation environments and standardized benchmarks were critical in advancing self-driving cars (e.g., Waymo Sim, KITTI) and LLMs (e.g., HELM, MMLU). We're applying this proven method to browser automation, enabling accurate benchmarking, rapid iteration, and real-world reliability.

For example, OpenAI could use Foundry to build a perfect replica of DoorDash's website, enabling them to run millions of ordering tests without ever touching real-world complexities like CAPTCHAs, payments, or anti-bot measures. This approach lets them clearly pinpoint why agents fail, rapidly iterate, and dramatically speed up agent improvements. Our mission is simple but ambitious: transform browser agents from unstable research projects into robust solutions enterprises can trust.

We’re a technically rigorous team of ML practitioners from Scale AI, committed to impactful engineering and groundbreaking products. If you're an exceptional engineer eager to tackle real-world challenges at the cutting edge of AI, ML, and RL, Foundry is your next opportunity.

Why This Matters

70% of software runs in browsers, but APIs cover only a fraction of web interactions. Reliable browser agents represent an enormous automation opportunity capable of reshaping industries. By solving reliability, Foundry positions itself at the center of this transformative shift.

Your Role

As a Founding Fullstack Engineer, you'll build critical systems and user experiences powering Foundry’s web simulation and evaluation platform. You'll collaborate closely with ML and RL specialists, influencing key technical decisions and directly shaping our product’s future.

What You'll Do

Develop Scalable Simulations: Architect backend systems capable of accurately simulating diverse browser interactions at scale.
Create Actionable Dashboards: Design intuitive interfaces that clearly illustrate agent performance, successes, and failures.
Collaborate at the AI Frontier: Work directly with ML and RL researchers to incorporate cutting-edge methods into robust production systems.
Influence Product Direction: Actively shape Foundry’s strategic roadmap, technology choices, and engineering culture.

Who We're Looking For

Proven ability to build and ship impactful, scalable products from the ground up.
Deep expertise in modern fullstack technologies (React, TypeScript, Node.js, Python).
Strong experience managing cloud infrastructure (AWS, GCP, Azure) and using containerization tools (Docker, Kubernetes).
Genuine curiosity and excitement to learn deeply about AI, ML, and reinforcement learning (prior AI experience not required).

Bonus Points

Expert-level experience in large-scale web scraping and crawling (Selenium, Puppeteer, Playwright, Scrapy, undetected-chromedriver).
Prior exposure or interest in reinforcement learning, deep learning, or advanced AI techniques.
Contributions to open-source projects or active involvement in technical communities.

Why Foundry?

Exceptional Team: Collaborate directly with ML and RL experts from Scale AI.
Rapid Professional Growth: Accelerate your expertise in AI, ML, and RL.
Significant Market Opportunity: Early entry into a $20B+ automation market.
Impactful Ownership: Your contributions directly shape Foundry’s trajectory and standards for intelligent automation.

We’re building the first end-to-end evaluation and training platform for web agents. Our system enables teams to test, benchmark, and optimize browser automation models at scale.

Deterministic Web Simulation → Stable, reproducible testing with versioned web snapshots.
Live Web Evaluation → Identify failures caused by UI drift, captchas, and dynamic content.
Automated Annotation & Labeling → Generate high-quality training data for benchmarking.
RL-Driven Agent Optimization → Improve models with scalable, feedback-driven learning.

By combining synthetic user simulations, automated evaluations, and large-scale benchmarking, we help teams build more reliable web agents that handle real-world environments with confidence.

Foundry

Founding Fullstack Engineer – Building the Future of Browser Agents

About the role