Speech-to-Text API Guide#
Overview#
The Audio API provides two main endpoints:📝 transcriptions: Convert audio to text
🔄 translations: Translate audio to English
📁 File size: Maximum 25 MB
🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm
Usage#
1. Transcription#
Convert audio to text in the original language2. Translation#
Convert audio in any language to English text3. Timestamp Feature#
4. Handling Large Files#
Use PyDub to split files larger than 25 MB:Optimization Recommendations#
Prompt Usage Tips#
1.
🔍 Correct specific vocabulary recognition
2.
📜 Maintain contextual coherence
3.
✍️ Control punctuation output
5.
📝 Control output text style (e.g., Simplified vs. Traditional Chinese)
Supported Languages#
Supports 98 languages, including:Major Asian languages: Chinese, Japanese, Korean, etc.
European languages: English, French, German, etc.
Other regional languages: Arabic, Hindi, etc.
Note: Only languages with word error rate (WER) below 50% are listed. Other languages are supported but may have lower quality.
Modified at 2026-03-06 15:34:52