An encyclopedia of jailbreaking techniques to make AI models safer.
General Analysis provides safety and performance reports for enterprise AI models, offering businesses clear insights into model vulnerabilities. Using a growing repository of automated red-teaming, jailbreaking, and interpretability techniques, we uncover and address critical failure modes.
As AI systems become increasingly capable, their deployment in high-stakes environments poses significant risks—financial, ethical, and otherwise—where errors can lead to substantial consequences. We predict that a large percentage of the world’s cognitive tasks will soon be performed by AI systems across industries. However, this shift brings critical challenges:
To address these challenges, we offer access to a unified set of tools and methodologies designed to systematically find model failure modes and enhance model robustness.
In our recent work, we show how GPT-4o is susceptible to hallucinating when asked about certain legal cases or concepts. The report, data and code are publicly available.
We train an attacker model that causes GPT-4o to hallucinate on more than 35% of prompts on a diverse set of legal questions.
Learn more at generalanlysis.com or read the full report here.
We are looking to connect with:
If you are interested in working with us or just want to chat please email us at founders@generalanalysis.com .