Text-to-Speech (TTS) API Guide

Overview

The audio API provides a speech endpoint that implements the following features based on the TTS model:

📝 Blog post reading aloud

🌍 Multi-language audio generation

🎵 Real-time audio streaming output

Important Note: You must inform users that the audio they hear is generated by AI, not human voices.

Basic Usage

Simple Example

Features

Audio Quality Options

tts-1: Low latency, suitable for real-time applications

tts-1-hd: Higher quality, possibly with less static content

Available Voices

alloy

echo

fable

nova

shimmer

onyx

Supported Output Formats

Format	Characteristics	Use Cases
MP3	Default format	General use
Opus	Low latency	Web streaming and communication
AAC	Efficient compression	Mobile playback
FLAC	Lossless compression	Audio archiving
WAV	Uncompressed	Low-latency applications
PCM	Raw sampling	24kHz, 16-bit signed

Real-Time Audio Streaming

Supported Languages

Supports multiple languages including:

Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other languages: Arabic, Hindi, etc.

Note: Current voices are primarily optimized for English.

Frequently Asked Questions

Q: How can I control the emotion of generated audio?

A: There is currently no direct mechanism to control this. Capital letters or grammar may influence the output, but results are uncertain.

Q: Can custom voices be created?

A: Creating custom voices is not supported.

Q: Who owns the rights to generated audio?

A: The creator retains ownership, but users must be informed that it's AI-generated audio.

python use text to speech

Text-to-Speech (TTS) API Guide#

Overview#

Basic Usage#

Simple Example#

Features#

Audio Quality Options#

Available Voices#

Supported Output Formats#

Real-Time Audio Streaming#

Supported Languages#

Frequently Asked Questions#

Q: How can I control the emotion of generated audio?#

Q: Can custom voices be created?#

Q: Who owns the rights to generated audio?#