Browse ModelsWavespeed AIOmnivoice Voice Clone

Omnivoice Voice Clone

Omnivoice Voice Clone

Playground

Try it on WavespeedAI!

OmniVoice Voice Clone clones any voice from a short 3-10 second audio sample. Supports 600+ languages with zero-shot voice cloning. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

OmniVoice Voice Clone

OmniVoice Voice Clone clones any voice from a short audio sample and generates natural speech in that voice. Upload a voice reference clip, provide the text you want spoken, and the model delivers high-quality cloned speech that matches the tone, style, and character of the original speaker.


Why Choose This?

  • High-fidelity voice cloning Captures the unique tone, cadence, and character of any voice from a short reference clip.

  • Natural speech output Generates fluid, human-sounding speech that closely matches the reference speaker’s style.

  • Reference text support Optionally provide the transcript of the reference audio to improve cloning accuracy.

  • Speed control Adjust the playback speed of the generated speech to match your pacing needs.


Parameters

ParameterRequiredDescription
textYesThe text you want the cloned voice to speak.
audioYesReference audio clip of the voice to clone (URL, file upload, or microphone recording).
reference_textNoTranscript of the reference audio. Improves cloning accuracy when provided.
speedNoPlayback speed of the generated speech. Default: 1.

How to Use

  1. Enter your text — type what you want the cloned voice to say.
  2. Upload the reference audio — provide a clear voice sample via URL, file upload, or microphone recording.
  3. Add reference text (optional) — provide the transcript of the reference clip for better accuracy.
  4. Set speed (optional) — adjust the speaking rate if needed.
  5. Submit — generate and download your cloned voice audio.

Pricing

  • Under 100 characters: flat $0.005 per generation
  • 100+ characters: $0.00005 per character (i.e. $0.005 per 100 characters)

Examples

Text LengthCost
50 chars$0.005
100 chars$0.005
500 chars$0.025
1000 chars$0.050

Best Use Cases

  • Content creation — Generate voiceovers in a consistent cloned voice for videos, podcasts, and social media.
  • Dubbing & localization — Clone a speaker’s voice for use in translated or localized audio content.
  • Audiobook production — Produce narration in a specific voice without booking studio time.
  • Personal voice preservation — Clone and preserve a unique voice for future use.
  • Developer integrations — Embed voice cloning into apps, platforms, and automated speech workflows.

Pro Tips

  • Use a clear, high-quality reference audio clip with minimal background noise for the most accurate clone.
  • A reference clip of 6–30 seconds with natural, expressive speech produces the best results.
  • Providing reference_text significantly improves cloning accuracy — always include it if you know the transcript.
  • For long text outputs, break content into natural sentence chunks for more controlled pacing.

Notes

  • Both text and audio are required fields.
  • Pricing is based on text length: flat $0.005 for under 100 characters, then $0.00005 per character beyond that.
  • Ensure audio URLs are publicly accessible if using a link rather than a direct upload.
  • Please ensure your content complies with WaveSpeed AI’s usage policies.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/omnivoice/voice-clone" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "speed": 1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
textstringYes--The text content to convert into speech using the cloned voice.
audiostringYes--URL of the reference audio to clone the voice from (3-10 seconds recommended).
reference_textstringNo--Transcript of the reference audio (optional, improves accuracy).
speednumberNo10 ~ 5Playback speed factor. 1.0 = normal speed. Values > 1.0 are faster, < 1.0 are slower.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.