python uses speech-to-text
Speech-to-Text API Guide#
Overview#
The audio API provides two main endpoints:📝 transcriptions: Convert audio to text
🔄 translations: Translate audio into English
🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm
Usage#
1. Transcription#
Convert audio into text in its original language2. Translation#
Convert audio from any language into English text3. Timestamp Functionality#
4. Handling Large Files#
Split files larger than 25MB using PyDub:Optimization Tips#
Prompt Usage Tips#
1.
🔍 Correct specific word recognition
2.
📜 Maintain contextual coherence
3.
✍️ Control punctuation output
5.
📝 Control output text style (e.g., Simplified or Traditional Chinese)
Supported Languages#
Supports 98 languages including:Major Asian languages: Chinese, Japanese, Korean, etc.
European languages: English, French, German, etc.
Other regional languages: Arabic, Hindi, etc.
Note: Only lists languages with Word Error Rate (WER) below 50%. Other languages are supported but may have lower quality.
Modified at 2025-12-02 13:49:34