MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Idle
$0.06per run·~16 / $1
MiniMax Speech 2.8 Turbo is a high-quality text-to-speech model that transforms written text into natural, expressive audio. With support for multiple voice presets, emotional tones, and fine-grained audio controls, it delivers broadcast-ready speech synthesis for any application.
Rich voice library Choose from 17+ preset voices spanning different genders, ages, and speaking styles — or use your own custom-trained voice.
Expressive interjections Add natural human sounds like (laughs), (sighs), (coughs), (gasps), and more directly in your text for lifelike delivery.
Emotion control Set the emotional tone of the speech — happy, calm, or other moods — to match your content.
Pronunciation customization Define custom pronunciations for brand names, acronyms, or specialized terms using the pronunciation dictionary.
Full audio control Fine-tune speed, volume, pitch, sample rate, bitrate, channel, and output format for production-ready results.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The text to convert to speech. Supports interjections like (laughs), (sighs), (coughs) |
| voice_id | Yes | Voice preset or custom voice ID (see Available Voices below) |
| speed | No | Speech speed multiplier (default: 1) |
| volume | No | Volume level (default: 1) |
| pitch | No | Pitch adjustment (default: 0) |
| emotion | No | Emotional tone: happy, calm, etc. |
| pronunciation_dict | No | Custom pronunciation mappings (e.g., Omg/Oh my god) |
| english_normalization | No | Improves number-reading performance in English text |
| sample_rate | No | Audio sample rate |
| bitrate | No | Audio bitrate |
| channel | No | Audio channel (mono/stereo) |
| format | No | Output format |
| language_boost | No | Boost specific language recognition |
Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
You can also use a custom voice ID trained via MiniMax Voice Clone.
(laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause)
| Metric | Cost |
|---|---|
| Per 1,000 characters | $0.06 |
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/minimax/speech-2.8-turbo with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Speech 2.8 Turbo below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/minimax/speech-2.8-turbo" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"speed": 1,
"volume": 1,
"pitch": 0,
"emotion": "happy",
"english_normalization": false,
"sample_rate": 8000,
"bitrate": 32000,
"channel": "1",
"format": "mp3",
"language_boost": "Chinese",
"enable_base64_output": false,
"enable_sync_mode": false
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("minimax/speech-2.8-turbo", {
"speed": 1,
"volume": 1,
"pitch": 0,
"emotion": "happy",
"english_normalization": false,
"sample_rate": 8000,
"bitrate": 32000,
"channel": "1",
"format": "mp3",
"language_boost": "Chinese",
"enable_base64_output": false,
"enable_sync_mode": false
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"minimax/speech-2.8-turbo",
{
"speed": 1,
"volume": 1,
"pitch": 0,
"emotion": "happy",
"english_normalization": false,
"sample_rate": 8000,
"bitrate": 32000,
"channel": "1",
"format": "mp3",
"language_boost": "Chinese",
"enable_base64_output": false,
"enable_sync_mode": false
}
)
print(output["outputs"][0]) # → URL of the generated outputSpeech 2.8 Turbo is a MiniMax model for audio generation, exposed as a REST API on WaveSpeedAI. MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-2.8-turbo.
Speech 2.8 Turbo starts at $0.060 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `bitrate`, `channel`, `emotion`, `enable_base64_output`, `enable_sync_mode`, `english_normalization`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-2.8-turbo.
Average end-to-end generation time on WaveSpeedAI is around 26 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (MiniMax). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.