Browse ModelsInworldInworld Realtime Tts 2

Inworld Realtime Tts 2

Inworld Realtime Tts 2

Playground

Try it on WavespeedAI!

Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Inworld Realtime TTS 2

Inworld Realtime TTS 2 converts text into natural-sounding speech with low-latency generation and flexible voice controls. It supports multiple output audio formats and lets you adjust speaking rate and temperature for different delivery styles.


Why Choose This?

  • Low-latency text-to-speech Generate speech quickly for interactive apps, assistants, and real-time voice experiences.

  • Natural voice output Create smooth, human-like speech from plain text with selectable voices.

  • Flexible voice controls Adjust speaking rate and temperature to better match tone, pacing, and delivery style.

  • Multiple output formats Export audio in MP3, LINEAR16, OGG_OPUS, FLAC, or WAV depending on your workflow.

  • Production-ready API Access the model through a realtime-friendly API for apps, agents, games, and voice products.


Parameters

ParameterRequiredDescription
textYesInput text to convert into speech.
voice_idNoVoice selection for the generated speech, such as Julia.
speaking_rateNoControls how fast the voice speaks. Default: 1.
temperatureNoControls variation and expressiveness in the generated speech. Default: 1.
output_formatNoOutput audio format: MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.

How to Use

  1. Enter your text — paste or type the content you want to convert into speech.
  2. Choose a voice — select the voice that best fits your use case.
  3. Adjust speaking rate and temperature (optional) — fine-tune pacing and expressiveness.
  4. Choose output format — select MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.
  5. Submit — generate and download the audio output.

Example Input

Welcome to our product demo. Today we will walk through the key features, explain how the workflow operates, and show how quickly you can integrate voice output into your application.


Pricing

Text LengthCost
1–1000 chars$0.035
1001–2000 chars$0.070
2001–3000 chars$0.105
3001–4000 chars$0.140
4001–5000 chars$0.175

Billing Rules

  • Pricing is based on the length of text.
  • Character count is rounded up to the next 1,000-character block.
  • Each additional started 1,000 characters adds $0.035.
  • voice_id, speaking_rate, temperature, and output_format do not affect pricing.

Best Use Cases

  • Realtime voice agents — Generate spoken responses for assistants, NPCs, and conversational interfaces.
  • Interactive applications — Add live voice output to games, education tools, and customer-facing apps.
  • Accessibility features — Turn written content into audio for more accessible user experiences.
  • Content narration — Create voiceovers for guides, product demos, and short-form content.
  • Prototype voice experiences — Quickly test different voices, pacing, and formats in development workflows.

Pro Tips

  • Keep input text clean and well-punctuated for more natural speech rhythm.
  • Split very long content into smaller sections when you want tighter pacing control.
  • Use speaking_rate to match the use case, such as slower for tutorials and faster for assistants.
  • Adjust temperature when you want more variation in delivery style.
  • Choose MP3 for broad compatibility, and use lossless formats like WAV or FLAC when audio quality matters more.
  • Reuse the same voice and settings across related clips for a more consistent user experience.

Notes

  • text is the only required field.
  • Supported output formats are MP3, LINEAR16, OGG_OPUS, FLAC, and WAV.
  • Pricing depends only on text length.
  • Audio format and voice settings do not change the price.

  • Other Inworld speech and voice generation models may be useful when you need different latency, quality, or voice configuration options.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "voice_id": "Dennis",
    "speaking_rate": 1,
    "temperature": 1,
    "output_format": "MP3",
    "enable_sync_mode": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
textstringYes--Text to synthesize into speech. Maximum input of 2,000 characters.
voice_idstringNoDennisAlex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Yichen, Xiaoyin, Xinyi, Jing, Erik, Katrien, Lennart, Lore, Alain, Hélène, Mathieu, Étienne, Johanna, Josef, Gianni, Orietta, Asuka, Satoshi, Hyunwoo, Minji, Seojun, Yoona, Szymon, Wojciech, Heitor, Maitê, Diego, Lupita, Miguel, Rafael, Svetlana, Elena, Dmitry, Nikolai, Riya, Manoj, Yael, Oren, Nour, OmarThe voice to use for speech generation.
speaking_ratenumberNo10.5 ~ 1.5The speed of speaking.
temperaturenumberNo10.7 ~ 1.5The temperature to use for the generation. A higher value means more randomness in the output.
output_formatstringNoMP3MP3, LINEAR16, OGG_OPUS, FLAC, WAVOutput audio format.
enable_sync_modebooleanNofalse-If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content.
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.