Speech 2.8 Turbo | Realistic Voice & TTS API

MiniMax Speech 2.8 Turbo

MiniMax Speech 2.8 Turbo is a high-quality text-to-speech model that transforms written text into natural, expressive audio. With support for multiple voice presets, emotional tones, and fine-grained audio controls, it delivers broadcast-ready speech synthesis for any application.

Why Choose This?

Rich voice library Choose from 17+ preset voices spanning different genders, ages, and speaking styles — or use your own custom-trained voice.
Expressive interjections Add natural human sounds like (laughs), (sighs), (coughs), (gasps), and more directly in your text for lifelike delivery.
Emotion control Set the emotional tone of the speech — happy, calm, or other moods — to match your content.
Pronunciation customization Define custom pronunciations for brand names, acronyms, or specialized terms using the pronunciation dictionary.
Full audio control Fine-tune speed, volume, pitch, sample rate, bitrate, channel, and output format for production-ready results.

Parameters

Parameter	Required	Description
text	Yes	The text to convert to speech. Supports interjections like (laughs), (sighs), (coughs)
voice_id	Yes	Voice preset or custom voice ID (see Available Voices below)
speed	No	Speech speed multiplier (default: 1)
volume	No	Volume level (default: 1)
pitch	No	Pitch adjustment (default: 0)
emotion	No	Emotional tone: happy, calm, etc.
pronunciation_dict	No	Custom pronunciation mappings (e.g., Omg/Oh my god)
english_normalization	No	Improves number-reading performance in English text
sample_rate	No	Audio sample rate
bitrate	No	Audio bitrate
channel	No	Audio channel (mono/stereo)
format	No	Output format
language_boost	No	Boost specific language recognition

Available Voices

Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl

You can also use a custom voice ID trained via MiniMax Voice Clone.

Supported Interjections

(laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause)

How to Use

Enter your text — write or paste the content you want to convert to speech.
Select voice_id — choose a preset voice or enter your custom voice ID.
Adjust speech settings (optional) — modify speed, volume, and pitch as needed.
Set emotion (optional) — select the emotional tone for the delivery.
Configure audio output (optional) — choose sample rate, bitrate, channel, and format.
Run — submit and download your audio file.

Pricing

Metric	Cost
Per 1,000 characters	$0.06

Best Use Cases

Audiobook Production — Convert manuscripts into natural-sounding narration with expressive voices.
Video Voiceovers — Generate professional voiceovers for YouTube, ads, or explainer videos.
Podcasts & Broadcasting — Create consistent voice content without recording equipment.
E-learning & Training — Produce clear, engaging audio for educational materials.
Accessibility — Convert written content to audio for visually impaired users.
Game & App Development — Add character voices and UI narration to interactive experiences.

Pro Tips

Use interjections sparingly for natural effect — too many can sound unnatural.
Match voice_id to your content: use "Deep_Voice_Man" or "Imposing_Manner" for authoritative content, "Lively_Girl" or "Casual_Guy" for friendly content.
Enable english_normalization when your text contains numbers, dates, or currencies.
Use pronunciation_dict for consistent handling of brand names or technical terms.
Start with default speed/pitch settings, then adjust based on your specific use case.

Notes

Text length affects processing time and cost — longer texts take more time.
For custom voices, train your voice model first via Voice Clone.
Interjections must be written in parentheses exactly as listed to be recognized.

Speech 2.8 Turbo API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/minimax/speech-2.8-turbo with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Speech 2.8 Turbo below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/minimax/speech-2.8-turbo" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": false,
    "sample_rate": 8000,
    "bitrate": 32000,
    "channel": "1",
    "format": "mp3",
    "language_boost": "Chinese",
    "enable_base64_output": false,
    "enable_sync_mode": false
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("minimax/speech-2.8-turbo", {
        "speed": 1,
        "volume": 1,
        "pitch": 0,
        "emotion": "happy",
        "english_normalization": false,
        "sample_rate": 8000,
        "bitrate": 32000,
        "channel": "1",
        "format": "mp3",
        "language_boost": "Chinese",
        "enable_base64_output": false,
        "enable_sync_mode": false
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "minimax/speech-2.8-turbo",
    {
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": false,
    "sample_rate": 8000,
    "bitrate": 32000,
    "channel": "1",
    "format": "mp3",
    "language_boost": "Chinese",
    "enable_base64_output": false,
    "enable_sync_mode": false
}
)

print(output["outputs"][0])  # → URL of the generated output

Speech 2.8 Turbo API — Frequently asked questions

What is the Speech 2.8 Turbo API?

Speech 2.8 Turbo is a MiniMax model for audio generation, exposed as a REST API on WaveSpeedAI. MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Speech 2.8 Turbo API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-2.8-turbo.

How much does Speech 2.8 Turbo cost per run?

Speech 2.8 Turbo starts at $0.060 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Speech 2.8 Turbo accept?

Key inputs: `bitrate`, `channel`, `emotion`, `enable_base64_output`, `enable_sync_mode`, `english_normalization`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-2.8-turbo.

How long does Speech 2.8 Turbo take to generate?

Average end-to-end generation time on WaveSpeedAI is around 26 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Speech 2.8 Turbo outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (MiniMax). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README