OpenAI Ushers in the Era of Voice Agents with API Pricing as Low as $0.015 per Minute
OpenAI launches next-gen audio models: Speech-to-text & text-to-speech APIs starting at $0.015/min. Build powerful voice agents with enhanced accuracy & steerability.
"AI Disruption" Publication 5000 Subscriptions 20% Discount Offer Link.
Just now, OpenAI announced the launch of a new generation of audio models in its API, including speech-to-text and text-to-speech functionalities, enabling developers to easily build powerful voice agents.
The core highlights of the new products are summarized as follows:
gpt-4o-transcribe (Speech-to-Text): Significant reduction in Word Error Rate (WER), outperforming the existing Whisper model in multiple benchmarks.
gpt-4o-mini-transcribe (Speech-to-Text): A streamlined version of gpt-4o-transcribe, faster and more efficient.
gpt-4o-mini-tts (Text-to-Speech): First-time support for "steerability," allowing developers to not only specify "what to say" but also control "how to say it."