Text-to-Speech (TTS) API Guide#
Overview#
The Audio API provides a speech endpoint that implements the following features based on TTS models:🌍 Multi-language audio generation
🎵 Real-time audio stream output
Important Note: You must disclose to users that the audio they hear is AI-generated speech, not human voice
Basic Usage#
Basic Example#
Features#
Audio Quality Options#
tts-1: Low latency, suitable for real-time applications
tts-1-hd: Higher quality, may have less static noise
Available Voices#
| Format | Characteristics | Use Cases |
|---|
| MP3 | Default format | General use |
| Opus | Low latency | Web streaming and communication |
| AAC | Efficient compression | Mobile device playback |
| FLAC | Lossless compression | Audio archiving |
| WAV | Uncompressed | Low-latency applications |
| PCM | Raw samples | 24kHz, 16-bit signed |
Real-time Audio Streaming#
Supported Languages#
Multiple languages are supported, including:Asian languages: Chinese, Japanese, Korean, etc.
European languages: English, French, German, etc.
Other languages: Arabic, Hindi, etc.
Note: Current voices are primarily optimized for English
Frequently Asked Questions#
Q: How do I control the emotion of the generated audio?#
A: There is currently no direct control mechanism. Uppercase letters or grammar may influence the output, but the effect is uncertain.Q: Can I create custom voices?#
A: Creating custom voices is not supported.Q: Who owns the generated audio?#
A: The audio is owned by the creator, but you must inform users that it is AI-generated audio.Modified at 2026-03-06 15:34:52