Meet Nova Sonic: Amazon’s AI Voice Model That Feels Human

Amazon has introduced Nova Sonic, its latest generative AI model designed to deliver fast, natural-sounding speech and handle real-time voice conversations with impressive fluency. Positioned as Amazon’s answer to cutting-edge voice models like OpenAI’s GPT-4o and Google’s Gemini, Nova Sonic marks a significant leap forward in voice AI — and a clear step beyond the legacy tech that powered the early days of Alexa.
Nova Sonic is now available to developers via Amazon Bedrock, the company’s platform for building enterprise-grade AI apps. It’s powered by a bi-directional streaming API, enabling fluid two-way dialogue with minimal latency. Amazon claims it’s not only faster and more accurate than competitors, but also around 80% more cost-efficient than GPT-4o, making it an attractive choice for developers building voice-powered applications.
Smarter Than Alexa, Faster Than GPT-4o
Unlike the scripted, robotic tone of early voice assistants like Alexa and Siri, Nova Sonic delivers speech that feels conversational and human, with a latency of just 1.09 seconds — quicker than GPT-4o's 1.18 seconds, according to independent benchmarks.
What sets Nova Sonic apart isn’t just its speed, but its intelligence in handling complex requests. Amazon’s SVP and Head Scientist of AGI, Rohit Prasad, explained that Nova Sonic excels at orchestrating actions across APIs and tools, allowing it to fetch real-time data, interact with apps, and parse proprietary databases in real time — a major upgrade over current assistant capabilities.
Nova Sonic also interprets pauses, interruptions, and natural speech patterns, making conversations feel more intuitive and less like talking to a machine. Developers get access to both the voice interaction and a real-time text transcript, enabling a range of use cases from customer service to voice-driven analytics.
Best-in-Class Speech Recognition
Accuracy is another standout feature. On the Multilingual LibriSpeech benchmark, which tests understanding across English, French, Italian, German, and Spanish, Nova Sonic achieved a word error rate (WER) of just 4.2%, outperforming many rivals. On a separate benchmark for multi-speaker environments, Nova Sonic was 46.7% more accurate than OpenAI’s GPT-4o-transcribe model, according to Amazon.
That means Nova Sonic is not only good at hearing you clearly — it’s also better at understanding your intent, even if you mumble, misspeak, or are surrounded by background noise.
A Glimpse Into Amazon’s AGI Ambitions
Nova Sonic is more than a voice assistant. It’s part of Amazon’s larger push toward artificial general intelligence (AGI) — systems that can perform any computer task a human can. Prasad emphasized that the company’s future AI models will expand beyond voice to include image, video, and sensory data, enabling richer, more context-aware experiences across both digital and physical environments.
Already, components of Nova Sonic are powering Alexa+, Amazon’s upgraded voice assistant. And just last week, the company previewed Nova Act, a browser-based AI model with web interaction capabilities. Together, they point to a future where Amazon’s internal AI models will become central to both its consumer products and developer ecosystem.
With Nova Sonic, Amazon isn’t just playing catch-up — it’s building the infrastructure for a more responsive, intelligent, and affordable AI-powered future.