Omnivoice Voice Clone
Playground
Try it on WavespeedAI!OmniVoice Voice Clone clones any voice from a short 3-10 second audio sample. Supports 600+ languages with zero-shot voice cloning. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
OmniVoice Voice Clone
OmniVoice Voice Clone clones any voice from a short audio sample and generates natural speech in that voice. Upload a voice reference clip, provide the text you want spoken, and the model delivers high-quality cloned speech that matches the tone, style, and character of the original speaker.
Why Choose This?
-
High-fidelity voice cloning Captures the unique tone, cadence, and character of any voice from a short reference clip.
-
Natural speech output Generates fluid, human-sounding speech that closely matches the reference speaker’s style.
-
Reference text support Optionally provide the transcript of the reference audio to improve cloning accuracy.
-
Speed control Adjust the playback speed of the generated speech to match your pacing needs.
Parameters
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The text you want the cloned voice to speak. |
| audio | Yes | Reference audio clip of the voice to clone (URL, file upload, or microphone recording). |
| reference_text | No | Transcript of the reference audio. Improves cloning accuracy when provided. |
| speed | No | Playback speed of the generated speech. Default: 1. |
How to Use
- Enter your text — type what you want the cloned voice to say.
- Upload the reference audio — provide a clear voice sample via URL, file upload, or microphone recording.
- Add reference text (optional) — provide the transcript of the reference clip for better accuracy.
- Set speed (optional) — adjust the speaking rate if needed.
- Submit — generate and download your cloned voice audio.
Pricing
- Under 100 characters: flat $0.005 per generation
- 100+ characters: $0.00005 per character (i.e. $0.005 per 100 characters)
Examples
| Text Length | Cost |
|---|---|
| 50 chars | $0.005 |
| 100 chars | $0.005 |
| 500 chars | $0.025 |
| 1000 chars | $0.050 |
Best Use Cases
- Content creation — Generate voiceovers in a consistent cloned voice for videos, podcasts, and social media.
- Dubbing & localization — Clone a speaker’s voice for use in translated or localized audio content.
- Audiobook production — Produce narration in a specific voice without booking studio time.
- Personal voice preservation — Clone and preserve a unique voice for future use.
- Developer integrations — Embed voice cloning into apps, platforms, and automated speech workflows.
Pro Tips
- Use a clear, high-quality reference audio clip with minimal background noise for the most accurate clone.
- A reference clip of 6–30 seconds with natural, expressive speech produces the best results.
- Providing reference_text significantly improves cloning accuracy — always include it if you know the transcript.
- For long text outputs, break content into natural sentence chunks for more controlled pacing.
Notes
- Both text and audio are required fields.
- Pricing is based on text length: flat $0.005 for under 100 characters, then $0.00005 per character beyond that.
- Ensure audio URLs are publicly accessible if using a link rather than a direct upload.
- Please ensure your content complies with WaveSpeed AI’s usage policies.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/omnivoice/voice-clone" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"speed": 1
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| text | string | Yes | - | - | The text content to convert into speech using the cloned voice. |
| audio | string | Yes | - | - | URL of the reference audio to clone the voice from (3-10 seconds recommended). |
| reference_text | string | No | - | - | Transcript of the reference audio (optional, improves accuracy). |
| speed | number | No | 1 | 0 ~ 5 | Playback speed factor. 1.0 = normal speed. Values > 1.0 are faster, < 1.0 are slower. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |