HomeCompaniesSkyvern

Skyvern

Open Source AI Agent to automate browser workflows via an API

Skyvern helps companies automate browser based workflows using LLMs and Computer Vision, fully automating manual workflows and replacing brittle or unreliable scripts. Examples include: 1. Automating materials procurement from commerce websites 2. Completing complex multi-step workflows (ie getting an insurance quote from Geico.com) 3. Automatically logging into portals and downloading invoices 4. Navigating legacy content systems to do data extraction or data entry
Skyvern
Founded:2023
Team Size:3
Location:San Francisco
Group Partner:Nicolas Dessaigne

Active Founders

Suchintan Singh, Founder

I'm the founder of Skyvern. We help companies automate complex workflows using AI agents and computer vision. In the past, I've built the ML Platforms at both Faire and Gopuff, which generated over $100M of GMV across their two businesses, I also love bad jokes and puns, so if you have any or are in the need of any feel free to hit me up!
Suchintan Singh
Suchintan Singh
Skyvern

Shuchang Zheng, Founder

I am the CTO and cofounder of Skyvern. For the past 5 years, I've been building developer and platform tools. At Lyft, I built testing tools (simulation platform, load test framework, and end to end test framework) used by 1000+ engineers to boost dev productivity, ensure Lyft doesn't go down during peak events, and reduce infra cost. At Patreon, I helped scale the payment and db infra to process 20M+ transactions per month and free 100+ engineers from the 3-day monthly code freeze
Shuchang Zheng
Shuchang Zheng
Skyvern

Company Launches

We’ve been working hard, cooking up something new to share with you all!

Skyvern 2.0 scored state-of-the-art 85.85% on the WebVoyager Eval.

This is the best-in-class performance of all WebAgents, giving advanced closed-source web agents like Google Mariner a run for their money.

TL;DR

  • Real-World Tests: We ran all of the tests in Skyvern Cloud to get a better representation of autonomous browser operations (ie, they didn’t run on any local machines)
  • Open-Sourced Results: All of the runs can be seen here through our UI.
  • We’re just getting started. Try Skyvern Cloud or Skyvern Open Source out for yourself and see Skyvern in action!

Agent Architecture

Achieving this SOTA result required expanding Skyvern’s original architecture. Skyvern 1.0 involved a single prompt operating in a loop both making decisions and taking actions on a website. This approach was a good starting point but scored ~45% on the WebVoyager benchmark because it had insufficient memory of previous actions and could not do complex reasoning.

To solve this problem, we created a self-reflection feedback loop within Skyvern. This resulted in 2 main changes:

  1. We added a “Planner” phase, which could decompose very complex objectives down into smaller achievable goals
  2. This allowed Skyvern to have a working memory of things it had completed and things that were still waiting to be finished
    • This allows Skyvern to work with long, complex prompts without increasing the hallucination rate
  3. We added a “Validator” phase, which confirmed whether or not the original goals the “Planner” generates are successfully completed
  4. This acts as a supervisor function to confirm that the Task executor is achieving its objectives as expected and report any errors/tweaks back to the Planner so it can make adjustments in real-time as needed

Test Setup

All tests were run in Skyvern Cloud with an async cloud browser and used a combination of GPT-4o and GPT-4o-mini as the primary decision-making LLMs. The goal of this test is to assert real-world quality — the quality represented by this benchmark is the same as what you would experience with Skyvern’s browsers running asynchronously.

💡 Why is this important? Most benchmarks are run on local browsers with a relatively safe IP address and an impressive browser fingerprint. This is not representative of how Autonomous agents will run in the cloud, and we wanted our benchmark to represent how agents would behave in production

In addition to the above, we’ve made a few minor tweaks to the dataset to bring it up to date:

  1. We’ve removed 8 tasks from the dataset because the results are no longer valid. For example, one of the tasks asked to go to apple.com and check when the Apple Vision Pro will be released — in 2025, it’s already been released and forgotten
  2. Many of the flight/hotel booking tasks referenced old dates. We updated both the prompt and the answer to more modern dates for this evaluation

🔍 For the curious:

The full dataset can be seen here: https://github.com/Skyvern-AI/skyvern/tree/main/evaluation/datasets

The full list of modifications can be seen here: https://github.com/Skyvern-AI/skyvern/pull/1576/commits/60dc48f4cf3b113ff1850e5267a197c84254edf1

Test Results

We’re doing something out of the ordinary. In addition to the results, we’re making our entire benchmark run public.

💡 Why is this important? Most benchmarks are run behind closed doors, with impressive results being published without any accompanying material to verify the results. This makes it hard to understand how things like hallucinations or website drift over time play an effect on agent performance

We believe this isn’t aligned with our open source mission, and have decided to publish the entire eval results to the public.

📊 All individual run results can be seen here: https://eval.skyvern.com

🔍 The entire Eval dataset can be seen here: https://github.com/Skyvern-AI/skyvern/tree/main/evaluation/datasets

Limitations of the WebVoyager benchmark

The WebVoyager benchmark is a comprehensive benchmark testing a variety of prompts on 15 different websites. While this acts as a good first step in testing Web agents, this only captures 15 hand-picked websites of the millions of active websites on the internet.

We think there is tremendous opportunity here to better evaluate web agents against one another with a more comprehensive benchmark similar to SWE-Bench.

What’s on the horizon

Browser automation is still a nascent space with tons of room for improvement. While we’ve achieved a major milestone in agent performance, a few important issues are next to be solved:

  1. Can we improve Skyvern’s reasoning to operate efficiently in situations with more uncertainty? Examples include vague prompts, ambiguous or highly complex websites/tools, websites with extremely poor UX (legacy portals)
  2. Can we give Skyvern access to more tools so it can effectively log into websites, make purchases, and behave more like a human?
  3. Can we have Skyvern better memorize things it has already done in the past so it can do them again at a lower price point?

References

Other Company Launches

🐉 Skyvern's Contact Form Agent: Automate contact form submissions with AI

Cold outbound is the worst. Try this new form of outbound instead!
Read Launch ›

🐉 Skyvern Cloud - Open source AI Agent to automate browser based workflows

The easiest way to automate similar tasks on a lot of different websites
Read Launch ›

🐉 Skyvern - Automate Browser-based workflows with AI

Automate manual workflows within your company with an AI Agent
Read Launch ›

🐉 Wyvern AI - Increase revenue in marketplaces with better product ranking

With Wyvern, not only do marketplaces get better product ranking, they get the ability to fine-tune the ranking with unique insights about their business
Read Launch ›