OpenAI Ushers in the Era of Voice Agents with API Pricing as Low as $0.015 per Minute

OpenAI launches next-gen audio models: Speech-to-text & text-to-speech APIs starting at $0.015/min. Build powerful voice agents with enhanced accuracy & steerability.

Mar 21, 2025

∙ Paid

"AI Disruption" Publication 5000 Subscriptions 20% Discount Offer Link.

Just now, OpenAI announced the launch of a new generation of audio models in its API, including speech-to-text and text-to-speech functionalities, enabling developers to easily build powerful voice agents.

The core highlights of the new products are summarized as follows:

gpt-4o-transcribe (Speech-to-Text): Significant reduction in Word Error Rate (WER), outperforming the existing Whisper model in multiple benchmarks.
gpt-4o-mini-transcribe (Speech-to-Text): A streamlined version of gpt-4o-transcribe, faster and more efficient.
gpt-4o-mini-tts (Text-to-Speech): First-time support for "steerability," allowing developers to not only specify "what to say" but also control "how to say it."

AI Disruption

OpenAI Ushers in the Era of Voice Agents with API Pricing as Low as $0.015 per Minute

OpenAI launches next-gen audio models: Speech-to-text & text-to-speech APIs starting at $0.015/min. Build powerful voice agents with enhanced accuracy & steerability.

This post is for paid subscribers