About Us
We’re building the first end-to-end testing platform for web agents, including a Browser Gym for RL-driven optimization. Our platform helps teams evaluate, benchmark, and improve web agents before they go live, ensuring they can handle real-world, dynamic environments.
With synthetic user simulations, automated evaluations, and large-scale benchmarking, we’re setting a new standard for web agent testing.
We’re a YC-backed team, and this is a founding engineering role—you’ll be one of the first hires defining how we crawl, structure, and analyze the open web at scale.
The Role
We need a Founding Web Scraping Engineer to build internet-scale web crawling infrastructure—not just scraping a single site, but handling millions of domains and evolving anti-bot defenses.
You’ll be responsible for designing robust, distributed crawling systems that adapt dynamically to web changes, optimize for efficiency, and ensure reliable data extraction.
What You’ll Do
- Build large-scale, distributed crawlers that intelligently prioritize, schedule, and optimize requests across millions of domains.
- Develop adaptive web scraping systems that handle DOM changes, WebSockets, AJAX-heavy sites, and dynamically loaded content.
- Optimize scraping performance and resilience, ensuring high-throughput data extraction with proxy/network optimizations and behavior-driven stealth tactics.
- Solve captchas at scale, integrating third-party solvers, heuristic-based workarounds, and behavior-driven bypass techniques.
- Manage proxy and identity rotation, implementing session-aware scraping, JA3/TLS fingerprint spoofing, and request signature control.
- Structure and clean extracted data for downstream analytics, AI training, and benchmarking applications.
What We’re Looking For
- Expert-level experience in large-scale web scraping & crawling (Selenium, Puppeteer, Playwright, Scrapy, undetected-chromedriver).
- Deep knowledge of anti-bot detection strategies (TLS fingerprinting, JA3 signatures, request header anomalies, and bot behavior tracking).
- Hands-on expertise with captcha-solving strategies, including leveraging APIs, OCR-based approaches, and behavior-driven evasion.
- Proven experience building efficient proxy management systems, including rotating IPs across residential, datacenter, and mobile networks.
- Proficiency in Python, Go, or JavaScript, with experience in high-performance, parallelized scraping frameworks.
- Understanding of HTTP/2, HTTP/3, WebSockets, GraphQL, and browser-based fingerprinting.
- Experience designing scalable, fault-tolerant scraping infrastructure that adapts to changes in real time.
Bonus Points
- Experience with search engine-scale crawling.
- Background in LLM-driven web extraction or RL-enhanced adaptive crawling.
- Contributions to open-source scraping tools or web automation projects.
Why Join?
- Founding role—you’ll define and own our web crawling infrastructure from day one.
- Work at internet scale—building a system that dynamically adapts and scales across millions of domains.
- YC-backed—we’re building something that doesn’t exist yet, and you’ll be part of the core team making it happen.