Seedance 2.0 | Special Offer ✦ 10% OFF NOW
Home/Explore/wavespeed-ai/omnivoice/text-to-speech

OmniVoice

wavespeed-ai /

OmniVoice is a massively multilingual zero-shot TTS supporting 600+ languages. Generate speech with auto voice or design custom voices using natural language descriptions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio
Input

Idle

Your request will cost $0.005 per run.

For $1 you can run this model approximately 200 times.

ExamplesView all

README

OmniVoice Text-to-Speech

OmniVoice Text-to-Speech converts any text into natural, expressive speech in 600+ languages. Choose from a set of voice attributes — gender, age, pitch, accent — and the model generates matching audio in seconds. No voice sample needed.

Why Choose This?

  • Attribute-driven voice design Pick from predefined voice attributes (gender, age, pitch, accent) to create your ideal voice — no audio sample required.

  • 600+ languages The broadest language coverage among zero-shot TTS models.

  • Speed control Adjust the speaking rate to match your content pacing needs.

  • Fast generation Output delivered in under 5 seconds.

Parameters

ParameterRequiredDescription
textYesThe text you want converted to speech.
voice_descriptionNoComma-separated voice attributes (see list below). If omitted, a random voice is used.
speedNoPlayback speed factor. 1.0 = normal. Range: 0.1–5.0.

Valid Voice Attributes

Combine any of these with commas (e.g. female, low pitch, british accent):

Gender: female, male

Age: child, teenager, young adult, middle-aged, elderly

Pitch: very low pitch, low pitch, moderate pitch, high pitch, very high pitch

Style: whisper

Accent: american accent, australian accent, british accent, canadian accent, chinese accent, indian accent, japanese accent, korean accent, portuguese accent, russian accent

Examples

  • female, low pitch, british accent
  • male, young adult, american accent
  • female, elderly, whisper
  • male, high pitch, indian accent

How to Use

  1. Enter your text — type or paste the content you want spoken.
  2. Choose voice attributes (optional) — combine gender, age, pitch, and accent attributes separated by commas.
  3. Set speed (optional) — adjust the speaking rate if needed.
  4. Submit — generate and download your audio in seconds.

Pricing

Text LengthCost
Under 100 chars$0.005 (flat)
100 chars$0.005
500 chars$0.025
1000 chars$0.050

Best Use Cases

  • Content creation — Generate voiceovers for videos, ads, and social media.
  • Audiobook & podcast production — Convert written content into listenable audio at scale.
  • App & product demos — Add natural speech to prototypes and presentations.
  • Accessibility — Convert text content into audio for audio-first audiences.
  • Multilingual apps — Generate speech in 600+ languages from a single model.

Pro Tips

  • Combine 2–3 attributes for best results (e.g. female, young adult, british accent).
  • Omit voice_description entirely for a random voice — useful for variety in batch generation.
  • Use whisper for ASMR-style or intimate content.
  • Adjust speed to 0.8 for calm narration or 1.3 for energetic delivery.

Related Models