Text-to-Speech (TTS) API Guide

Overview

The Audio API provides a speech endpoint that implements the following features based on TTS models:

📝 Blog article narration

🌍 Multi-language audio generation

🎵 Real-time audio stream output

Important Note: You must disclose to users that the audio they hear is AI-generated speech, not human voice

Basic Usage

Basic Example

Features

Audio Quality Options

tts-1: Low latency, suitable for real-time applications

tts-1-hd: Higher quality, may have less static noise

Available Voices

alloy

echo

fable

nova

shimmer

onyx

Supported Output Formats

Format	Characteristics	Use Cases
MP3	Default format	General use
Opus	Low latency	Web streaming and communication
AAC	Efficient compression	Mobile device playback
FLAC	Lossless compression	Audio archiving
WAV	Uncompressed	Low-latency applications
PCM	Raw samples	24kHz, 16-bit signed

Real-time Audio Streaming

Supported Languages

Multiple languages are supported, including:

Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other languages: Arabic, Hindi, etc.

Note: Current voices are primarily optimized for English

Frequently Asked Questions

Q: How do I control the emotion of the generated audio?

A: There is currently no direct control mechanism. Uppercase letters or grammar may influence the output, but the effect is uncertain.

Q: Can I create custom voices?

A: Creating custom voices is not supported.

Q: Who owns the generated audio?

A: The audio is owned by the creator, but you must inform users that it is AI-generated audio.

Python Text-to-Speech

Text-to-Speech (TTS) API Guide#

Overview#

Basic Usage#

Basic Example#

Features#

Audio Quality Options#

Available Voices#

Supported Output Formats#

Real-time Audio Streaming#

Supported Languages#

Frequently Asked Questions#

Q: How do I control the emotion of the generated audio?#

Q: Can I create custom voices?#

Q: Who owns the generated audio?#