Relari - Testing and simulation stack for GenAI systems

Relari helps AI teams simulate, test, and validate complex GenAI applications throughout the development lifecycle.

Yi Zhang

a year ago

https://www.relari.ai/

TL;DR

Relari is an open-source platform to simulate, test, and validate complex Generative AI (GenAI) applications. With 30+ open-source metrics, synthetic test-set generators, and online monitoring tools, we help AI app developers achieve a high degree of reliability in mission-critical use cases such as compliance copilot, enterprise search, financial assistance, etc.

⭐ Meet the Relari Team

Hi everyone, Yi and Pasquale from Relari here. We both spent years building safety-critical AI applications, particularly in autonomous driving and robotics. Pasquale has a PhD from MIT where he researched fault detection in complex AI systems. Yi has an MBA from Harvard and led multiple AI products from concept to production including robo-taxis, self-driving trucks, and warehouse robots.

💡 Our Inspiration: Autonomous Vehicles

Just like autonomous vehicles promise to change how we move, GenAI applications promise to revolutionize the way we work. However, ensuring these systems are safe and reliable requires a shift in the development process. Autonomous vehicles need to drive billions of miles to ensure that they are safer than human drivers. This would take decades, so the industry relies on simulation and synthetic data to efficiently test and validate each iteration of the self-driving software stack.

We see a strong parallel in the world of GenAI. At Relari.ai, we are building towards a future where a similar infrastructure will enable complex and powerful applications based on large language models (LLMs).

🔴 Problem: GenAI Apps are Unreliable

LLM-based applications can be inconsistent and unreliable. This blocks GenAI’s adoption in mission-critical workflows and hurts user confidence and retention once deployed to production. Good testing infrastructure is paramount to achieving the quality users demand, but GenAI app developers across startups and enterprises struggle to define the right set of tests and quality-control standards for deployment.

The main challenges the AI teams face are:

Complex Pipelines: GenAI pipelines are getting increasingly more complex and it is often difficult to pinpoint where problems originate.
Gap from Evaluation to Reality: There is a huge gap between the metrics used in offline evaluation and user feedback, leading to distrust in offline evaluation results.
Lack of Relevant Datasets: Public datasets are overfitted by models and often not relevant to specific applications. However, manual curation of custom datasets is extremely time-consuming and costly.

🚀 Solution: Harden AI Systems with Simulation

Relari offers a complete testing and simulation stack for GenAI pipelines designed to directly address the problems above. Relari allows you to:

Pinpoint problem root causes with modular evaluation: Define your pipeline and flexibly orchestrate modular tests to quickly analyze performance issues. Our open source framework offers 30+ metrics covering text generation, code generation, retrieval, agents, and classification with more coming soon.
Simulate user behavior with close-to-human evaluators: Leverage user feedback to train custom evaluators that are 90%+ aligned with human evaluators (backed by our research). Introduce a feedback loop connecting your production system and the development process.
Accelerate development with synthetic data: Generate large-scale synthetic datasets tailored to your use case and stress test your AI pipeline. Ensure coverage of all the corner cases before shipping to users.

❤️ Our Ask

⭐ Star us on Github (link)
📅 Book a demo with us (link)
👉 Introduce us to AI / ML / Data Science teams building mission-critical GenAI applications (founders@relari.ai)