Omnivoice Text To Speech
Playground
Try it on WavespeedAI!OmniVoice is a massively multilingual zero-shot TTS supporting 600+ languages. Generate speech with auto voice or design custom voices using natural language descriptions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
OmniVoice Text-to-Speech
OmniVoice Text-to-Speech converts any text into natural, expressive speech in 600+ languages. Choose from a set of voice attributes — gender, age, pitch, accent — and the model generates matching audio in seconds. No voice sample needed.
Why Choose This?
-
Attribute-driven voice design Pick from predefined voice attributes (gender, age, pitch, accent) to create your ideal voice — no audio sample required.
-
600+ languages The broadest language coverage among zero-shot TTS models.
-
Speed control Adjust the speaking rate to match your content pacing needs.
-
Fast generation Output delivered in under 5 seconds.
Parameters
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The text you want converted to speech. |
| voice_description | No | Comma-separated voice attributes (see list below). If omitted, a random voice is used. |
| speed | No | Playback speed factor. 1.0 = normal. Range: 0.1–5.0. |
Valid Voice Attributes
Combine any of these with commas (e.g. female, low pitch, british accent):
Gender: female, male
Age: child, teenager, young adult, middle-aged, elderly
Pitch: very low pitch, low pitch, moderate pitch, high pitch, very high pitch
Style: whisper
Accent: american accent, australian accent, british accent, canadian accent, chinese accent, indian accent, japanese accent, korean accent, portuguese accent, russian accent
Examples
female, low pitch, british accentmale, young adult, american accentfemale, elderly, whispermale, high pitch, indian accent
How to Use
- Enter your text — type or paste the content you want spoken.
- Choose voice attributes (optional) — combine gender, age, pitch, and accent attributes separated by commas.
- Set speed (optional) — adjust the speaking rate if needed.
- Submit — generate and download your audio in seconds.
Pricing
| Text Length | Cost |
|---|---|
| Under 100 chars | $0.005 (flat) |
| 100 chars | $0.005 |
| 500 chars | $0.025 |
| 1000 chars | $0.050 |
Best Use Cases
- Content creation — Generate voiceovers for videos, ads, and social media.
- Audiobook & podcast production — Convert written content into listenable audio at scale.
- App & product demos — Add natural speech to prototypes and presentations.
- Accessibility — Convert text content into audio for audio-first audiences.
- Multilingual apps — Generate speech in 600+ languages from a single model.
Pro Tips
- Combine 2–3 attributes for best results (e.g.
female, young adult, british accent). - Omit
voice_descriptionentirely for a random voice — useful for variety in batch generation. - Use
whisperfor ASMR-style or intimate content. - Adjust
speedto 0.8 for calm narration or 1.3 for energetic delivery.
Related Models
- OmniVoice Voice Clone — Clone a specific voice from a reference audio sample.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/omnivoice/text-to-speech" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"speed": 1
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| text | string | Yes | - | - | The text content to convert into speech. Supports 600+ languages. |
| voice_description | string | No | - | - | Comma-separated voice attributes. If omitted, a random voice is used. Valid English attributes: female, male, child, teenager, young adult, middle-aged, elderly, low pitch, moderate pitch, high pitch, very low pitch, very high pitch, whisper, american accent, australian accent, british accent, canadian accent, chinese accent, indian accent, japanese accent, korean accent, portuguese accent, russian accent. Example: 'female, low pitch, british accent'. |
| speed | number | No | 1 | 0 ~ 5 | Playback speed factor. 1.0 = normal speed. Values > 1.0 are faster, < 1.0 are slower. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |