OmniVoice Voice Clone
OmniVoice Voice Clone clones any voice from a short audio sample and generates natural speech in that voice. Upload a voice reference clip, provide the text you want spoken, and the model delivers high-quality cloned speech that matches the tone, style, and character of the original speaker.
Why Choose This?
-
High-fidelity voice cloning
Captures the unique tone, cadence, and character of any voice from a short reference clip.
-
Natural speech output
Generates fluid, human-sounding speech that closely matches the reference speaker's style.
-
Reference text support
Optionally provide the transcript of the reference audio to improve cloning accuracy.
-
Speed control
Adjust the playback speed of the generated speech to match your pacing needs.
Parameters
| Parameter | Required | Description |
|---|
| text | Yes | The text you want the cloned voice to speak. |
| audio | Yes | Reference audio clip of the voice to clone (URL, file upload, or microphone recording). |
| reference_text | No | Transcript of the reference audio. Improves cloning accuracy when provided. |
| speed | No | Playback speed of the generated speech. Default: 1. |
How to Use
- Enter your text — type what you want the cloned voice to say.
- Upload the reference audio — provide a clear voice sample via URL, file upload, or microphone recording.
- Add reference text (optional) — provide the transcript of the reference clip for better accuracy.
- Set speed (optional) — adjust the speaking rate if needed.
- Submit — generate and download your cloned voice audio.
Pricing
- Under 100 characters: flat $0.005 per generation
- 100+ characters: $0.00005 per character (i.e. $0.005 per 100 characters)
Examples
| Text Length | Cost |
|---|
| 50 chars | $0.005 |
| 100 chars | $0.005 |
| 500 chars | $0.025 |
| 1000 chars | $0.050 |
Best Use Cases
- Content creation — Generate voiceovers in a consistent cloned voice for videos, podcasts, and social media.
- Dubbing & localization — Clone a speaker's voice for use in translated or localized audio content.
- Audiobook production — Produce narration in a specific voice without booking studio time.
- Personal voice preservation — Clone and preserve a unique voice for future use.
- Developer integrations — Embed voice cloning into apps, platforms, and automated speech workflows.
Pro Tips
- Use a clear, high-quality reference audio clip with minimal background noise for the most accurate clone.
- A reference clip of 6–30 seconds with natural, expressive speech produces the best results.
- Providing reference_text significantly improves cloning accuracy — always include it if you know the transcript.
- For long text outputs, break content into natural sentence chunks for more controlled pacing.
Notes
- Both text and audio are required fields.
- Pricing is based on text length: flat $0.005 for under 100 characters, then $0.00005 per character beyond that.
- Ensure audio URLs are publicly accessible if using a link rather than a direct upload.
- Please ensure your content complies with WaveSpeed AI's usage policies.