
Today, we’re excited to introduce Miso One, the most emotive voice model in the world.
Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.
We’ve open-sourced the model weights, with API access coming soon.
https://www.youtube.com/watch?v=HizlJgDbac8
All of these voiceovers were generated by Miso One!
https://www.youtube.com/watch?v=QX6a_lUdwC8
https://www.youtube.com/watch?v=ywH2C4LKu9E
https://www.youtube.com/shorts/jTLNoUH85iY\
Our ask:
Check out our repository and give us a star!
https://github.com/MisoLabsAI/MisoTTS
You can also test out the model directly at misolabs.ai.