General Analysis: Finding Failure Modes for AI Models

An encyclopedia of jailbreaking techniques to make AI models safer.

Rez Havaei

2 months ago

#saas#trust_&_safety#artificial_intelligence

TL;DR

General Analysis provides safety and performance reports for enterprise AI models, offering businesses clear insights into model vulnerabilities. Using a growing repository of automated red-teaming, jailbreaking, and interpretability techniques, we uncover and address critical failure modes.

Challenge

As AI systems become increasingly capable, their deployment in high-stakes environments poses significant risks—financial, ethical, and otherwise—where errors can lead to substantial consequences. We predict that a large percentage of the world’s cognitive tasks will soon be performed by AI systems across industries. However, this shift brings critical challenges:

Safety and performance efforts are not keeping pace with AI capabilities: Research and tools to evaluate AI systems have not kept up with the complexity and impact of modern models.
The field is fragmented: Approaches to AI safety and evaluation are scattered, lacking a unified framework.
Methods lack scalability and automation: Many current techniques are labor-intensive and fail to provide consistent, repeatable insights at scale.

Our approach

To address these challenges, we offer access to a unified set of tools and methodologies designed to systematically find model failure modes and enhance model robustness.

Providing Comprehensive Safety and Performance Reports: We deliver detailed reports to our customers, identifying novel failure modes in their models and providing actionable methods to mitigate them.
A Living Knowledge Base: Our repository collects and refines evaluation techniques while keeping pace with emerging exploits and vulnerabilities. This ensures our tools remain effective and relevant across diverse industries and evolving AI applications.

An example of our work: Eliciting legal hallucinations in GPT-4o

In our recent work, we show how GPT-4o is susceptible to hallucinating when asked about certain legal cases or concepts. The report, data and code are publicly available.

We train an attacker model that causes GPT-4o to hallucinate on more than 35% of prompts on a diverse set of legal questions.

Learn more at generalanlysis.com or read the full report here.

We are looking to connect with:

Startups creating LLMs or AI Agents in different sectors (Customer Support, Legal Tech, Medicine, foundation models) for design partnerships.
AI Safety, Interpretability, and Evaluation Researchers.

If you are interested in working with us or just want to chat please email us at founders@generalanalysis.com .