Home
Companies
SafetyKit

SafetyKit

AI-Powered Trust and Safety Automation

SafetyKit replaces human Trust and Safety reviewers with language models. We make it easy for enterprise Trust and Safety teams to supercharge their content review workflows — speeding up agent decision-making 5x or by automating the review altogether — and significantly reduce operations costs with faster, more accurate decisions. With SafetyKit, Trust and Safety teams write their policies in natural language and use them to detect and action nefarious content, instantly. Each decision is accompanied by an explanation grounded in your policies — not a generic definition or model score. We allow TnS teams to confidently scale their capacity, freeing up your agents for the highest leverage work, while reducing those agents exposure to problematic content.

Jobs at SafetyKit

San Francisco, CA, US
$150K - $220K
0.25% - 0.75%
Any (new grads ok)
SafetyKit
Founded:2023
Team Size:3
Location:
Group Partner:Gustaf Alstromer

Active Founders

David Graunke

David led engineering for risk reviewes at Stripe for fraud, credit, content moderation, and financial crimes. He built the policy and workflow engine that scaled Stripe from internal reviewers to thousands of outsourced vendor agents.

David Graunke
David Graunke
SafetyKit

Steven Guichard

Steven is a cofounder of SafetyKit, working to replace human trust and safety reviewers with language models. Previously he worked at Carbic as the first software engineer and later as the CEO. There he helped build ultrasonic flow sensors which were installed on oil pipelines and offshore rigs around the world. Steven also cofounded Thomas Street, a software design and engineering consultancy, where he worked with companies like Cisco, Roche, and DirecTV.

Steven Guichard
Steven Guichard
SafetyKit

Alex Rosenblatt

Previous: 10 years as product manager at Airbnb, Stripe, and Meta building scalable and resilient enforcement platforms for Trust and Safety — my teams worked across the risk spectrum — from fraud to offline safety — building frameworks, automation, and agent tooling. Now: launching SafetyKit to put automation in the hands of Operations teams — focusing first on making it simple for Trust and Safety teams to utilize LLMs to automate enforcement.

Alex Rosenblatt
Alex Rosenblatt
SafetyKit

Company Launches

The problem

Trust and Safety teams at large companies spend tens of millions of dollars on human reviewers. These reviewers make decisions about what is or isn’t allowed on the platform. This includes content moderation, but also things like checking Airbnb listings for discriminatory language, or reviewing Stripe accounts for Sanctions violations. These reviewers are outsourced agents following prescriptive workflows.

Managing this workforce is painful and expensive. Workflow changes take months to deploy, quality monitoring is inaccurate and ad-hoc, and tooling improvements and automation require very scarce engineering resources.

Companies use humans because they’re flexible, but those humans aren’t particularly good at it. Human decision-making sticks around because ML and automation is expensive and rigid, and requires eng resources T&S teams don’t have. This is despite the fact that humans are not particularly accurate reviewers (accuracy is frequently around 70%).

Our solution

We use GPT-4 and other language models to directly interpret and apply the workflows that would otherwise be performed by humans. GPT-4 performs well at these tasks, but letting T&S teams run it on thousands or millions of pieces of content safely and confidently requires work.

We’ve built a policy manager/editor that makes T&S teams feel like they’re editing a policy or workflow in Google Docs. We never want our T&S users to feel like they’re prompt engineering!

Users can add policy definitions, pick out of the content signals that matter to them, and build up automated rules based on those signals.

We slice and dice the input document into a series of prompts that we run through a suite of LLMs and image models. Our user can then can run their policy across examples to see how it performs:

Explainability and decision-making papertrails are super important to our users. Traditional ML and human review fall pretty short here. SafetyKit gives our users clear reasoning for each decision:

We think that this transparency plus built-in quality monitoring and a much much faster feedback loop will make SafetyKit more reliable and precise than human reviewers.

We provide a simple API for evaluating content against SafetyKit policies along with a prebuilt integrations for Salesforce and Zendesk.

Right now we evaluate policies over text and image content and we’re working on audio and video support.

Who are we?

We’re Steven, David, and Alex.

David and Alex worked at Stripe, where we worked on a platform to break big complicated policies into small steps that human reviewers are good at.

Alex did the same thing at Airbnb before joining Stripe, focusing on marketplace risk and offline safety.

Now we’re using AI to give every company the same scale and precision.

How you can help!

We want to talk to Trust and Safety teams! Please email us at founders@getsafetykit.com! Beyond T&S, if you have repetitive human decisions you want to automate in Customer Service, Legal Ops, or another back-of-house function, we’d love to talk!