TL;DR: Make your voice agents way more realistic and stop getting hangups. Check it out in action at or talk to it yourself by calling our AI Trump voice agent at (510) 315-0014 or vogent.ai/sesame
Vogent powers millions of voice AI phone calls for healthcare, customer support, travel, and more.
Today, we’re launching the most realistic realtime voice engine ever. It inserts natural pauses and disfluencies (“um”, “uh”) without being explicitly prompted, and is uncannily good at cloning voices. Check it out in the videos below:
How did we do it?
If you’ve been online recently, you’ve probably come across Sesame’s super humanlike voice AI, and their open-source launch of CSM-1B, a 1B-parameter text-to-speech model.
We’ve spent the past couple of weeks rearchitecting CSM-1B from the ground-up to support realtime, low-latency inference, and we’ve finally cracked it. The results are pretty insane; it’s a step-function increase in realism (customer hangup rates have decreased by 60%+), and we made it available out-of-the-box.
If you want to create your own Sesame agents, sign up at app.vogent.ai, and select one of our Sesame voices. You can also use Sesame to clone new voices, as long as you have a 10-15 second reference clip.
We’re also releasing a realtime Sesame TTS soon; email j@vogent.ai for beta access.