tl;dr Chatter is Postman for LLMs. Our platform helps companies and developers test their LLM models Iterate on prompts, run them across model families and evaluate them against test cases – all in one place. With collaboration features, engineers can design LLM chains while QA can write test cases.
—
Hi everyone, we’re Anish and Kasyap aka the team behind Chatter. We met at UPenn and dived deep into LLMs. After working on a couple projects, we realized LLM testing was a growing issue and built Chatter.
🙋♂️ Who is it for?
We built Chatter for developers and companies wanting observability while building with LLMs. If you’re building LLM-powered applications, refining your prompts, and care about their accuracy/performance (incorrect responses, hallucinations, etc), then Chatter is built for you.
⚙️ How does it work?
🏗️ Build / Iterate Fast, with all the features you need
Multiple foundation models, LLM chaining, formatted responses, Jinja2 prompt templating, parameter tuning, and more – all without a single line of code. When you’re done, export the call(s) to your favorite programming language.
🧪 Test on hundreds of inputs, with automatic evaluation
Easily build a test suite as large as you need, and run it all with a single click. Add assertions for each LLM call, with automatic evaluation using anything from regex to LLM-powered methods. Everything is versioned in one place for easy comparison.
🚀 Collaborate
Need domain experts to review results, or your customer success team to bring in user feedback? Invite anyone into shared workspaces and make sure your users benefit from your combined expertise.
🙏 Our ask
If you know a company in your network that’s looking to improve their process for building with LLMs, we’d love a warm intro! My email is anish@trychatter.ai
🎡 Try the playground (bonus: share your feedback with us founders@trychatter.ai)
Share this post!