AI Disruption

AI Disruption

OpenAI Ushers in the Era of Voice Agents with API Pricing as Low as $0.015 per Minute

OpenAI launches next-gen audio models: Speech-to-text & text-to-speech APIs starting at $0.015/min. Build powerful voice agents with enhanced accuracy & steerability.

Meng Li's avatar
Meng Li
Mar 21, 2025
∙ Paid

"AI Disruption" Publication 5000 Subscriptions 20% Discount Offer Link.


Just now, OpenAI announced the launch of a new generation of audio models in its API, including speech-to-text and text-to-speech functionalities, enabling developers to easily build powerful voice agents.

The core highlights of the new products are summarized as follows:

  • gpt-4o-transcribe (Speech-to-Text): Significant reduction in Word Error Rate (WER), outperforming the existing Whisper model in multiple benchmarks.

  • gpt-4o-mini-transcribe (Speech-to-Text): A streamlined version of gpt-4o-transcribe, faster and more efficient.

  • gpt-4o-mini-tts (Text-to-Speech): First-time support for "steerability," allowing developers to not only specify "what to say" but also control "how to say it."

User's avatar

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.
© 2026 Meng Li · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture