Speech-to-Text API Guide

Overview

The audio API provides two main endpoints:

📝 transcriptions: Convert audio to text

🔄 translations: Translate audio into English

Supported Formats

📁 File size: Up to 25 MB

🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm

Usage

1. Transcription

Convert audio into text in its original language

2. Translation

Convert audio from any language into English text

3. Timestamp Functionality

4. Handling Large Files

Split files larger than 25MB using PyDub:

Optimization Tips

Prompt Usage Tips

🔍 Correct specific word recognition

📜 Maintain contextual coherence

✍️ Control punctuation output

🗣️ Preserve filler words

📝 Control output text style (e.g., Simplified or Traditional Chinese)

Supported Languages

Supports 98 languages including:

Major Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other regional languages: Arabic, Hindi, etc.

Note: Only lists languages with Word Error Rate (WER) below 50%. Other languages are supported but may have lower quality.

python uses speech-to-text

Speech-to-Text API Guide#

Overview#

Supported Formats#

Usage#

1. Transcription#

2. Translation#

3. Timestamp Functionality#

4. Handling Large Files#

Optimization Tips#

Prompt Usage Tips#

Supported Languages#