SafetyKit

AI-Powered Trust and Safety Automation

S23

Active

https://www.safetykit.com/

AI-Powered Trust and Safety Automation

SafetyKit replaces human Trust and Safety reviewers with language models. We make it easy for enterprise Trust and Safety teams to supercharge their content review workflows — speeding up agent decision-making 5x or by automating the review altogether — and significantly reduce operations costs with faster, more accurate decisions. With SafetyKit, Trust and Safety teams write their policies in natural language and use them to detect and action nefarious content, instantly. Each decision is accompanied by an explanation grounded in your policies — not a generic definition or model score. We allow TnS teams to confidently scale their capacity, freeing up your agents for the highest leverage work, while reducing those agents exposure to problematic content.

Jobs at SafetyKit

View all jobs ›

Product Engineer (Full stack)

San Francisco, CA, US

$170K - $200K

0.20% - 0.50%

Any (new grads ok)

Apply Now ›

SafetyKit

Founded:2023

Team Size:10

Status:

Active

Location:San Francisco

Group Partner:Gustaf Alstromer

Active Founders

David Graunke, Founder

David led engineering for risk reviews at Stripe for fraud, credit, content moderation, and financial crimes. He built the policy and workflow engine that scaled Stripe from internal reviewers to thousands of outsourced vendor agents.

David Graunke

SafetyKit

Steven Guichard, Founder

Steven is a cofounder of SafetyKit, working to replace human trust and safety reviewers with language models. Previously he worked at Carbic as the first software engineer and later as the CEO. There he helped build ultrasonic flow sensors which were installed on oil pipelines and offshore rigs around the world. Steven also cofounded Thomas Street, a software design and engineering consultancy, where he worked with companies like Cisco, Roche, and DirecTV.

Steven Guichard

SafetyKit

Company Launches

SafetyKit — AI-powered Trust and Safety automation

See original launch post ›

The problem

Trust and Safety teams at large companies spend tens of millions of dollars on human reviewers. These reviewers make decisions about what is or isn’t allowed on the platform. This includes content moderation, but also things like checking Airbnb listings for discriminatory language, or reviewing Stripe accounts for Sanctions violations. These reviewers are outsourced agents following prescriptive workflows.

Managing this workforce is painful and expensive. Workflow changes take months to deploy, quality monitoring is inaccurate and ad-hoc, and tooling improvements and automation require very scarce engineering resources.

Companies use humans because they’re flexible, but those humans aren’t particularly good at it. Human decision-making sticks around because ML and automation is expensive and rigid, and requires eng resources T&S teams don’t have. This is despite the fact that humans are not particularly accurate reviewers (accuracy is frequently around 70%).

Our solution

We use GPT-4 and other language models to directly interpret and apply the workflows that would otherwise be performed by humans. GPT-4 performs well at these tasks, but letting T&S teams run it on thousands or millions of pieces of content safely and confidently requires work.

We’ve built a policy manager/editor that makes T&S teams feel like they’re editing a policy or workflow in Google Docs. We never want our T&S users to feel like they’re prompt engineering!

Users can add policy definitions, pick out of the content signals that matter to them, and build up automated rules based on those signals.

We slice and dice the input document into a series of prompts that we run through a suite of LLMs and image models. Our user can then can run their policy across examples to see how it performs:

Explainability and decision-making papertrails are super important to our users. Traditional ML and human review fall pretty short here. SafetyKit gives our users clear reasoning for each decision:

We think that this transparency plus built-in quality monitoring and a much much faster feedback loop will make SafetyKit more reliable and precise than human reviewers.

We provide a simple API for evaluating content against SafetyKit policies along with a prebuilt integrations for Salesforce and Zendesk.

Right now we evaluate policies over text and image content and we’re working on audio and video support.

Who are we?

We’re Steven, David, and Alex.

David and Alex worked at Stripe, where we worked on a platform to break big complicated policies into small steps that human reviewers are good at.

Alex did the same thing at Airbnb before joining Stripe, focusing on marketplace risk and offline safety.

Now we’re using AI to give every company the same scale and precision.

How you can help!

We want to talk to Trust and Safety teams! Please email us at founders@getsafetykit.com! Beyond T&S, if you have repetitive human decisions you want to automate in Customer Service, Legal Ops, or another back-of-house function, we’d love to talk!