Inworld Realtime Tts 2

Playground

Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Inworld Realtime TTS 2

Inworld Realtime TTS 2 converts text into natural-sounding speech with low-latency generation and flexible voice controls. It supports multiple output audio formats and lets you adjust speaking rate and temperature for different delivery styles.

Why Choose This?

Low-latency text-to-speech Generate speech quickly for interactive apps, assistants, and real-time voice experiences.
Natural voice output Create smooth, human-like speech from plain text with selectable voices.
Flexible voice controls Adjust speaking rate and temperature to better match tone, pacing, and delivery style.
Multiple output formats Export audio in MP3, LINEAR16, OGG_OPUS, FLAC, or WAV depending on your workflow.
Production-ready API Access the model through a realtime-friendly API for apps, agents, games, and voice products.

Parameters

Parameter	Required	Description
text	Yes	Input text to convert into speech.
voice_id	No	Voice selection for the generated speech, such as `Julia`.
speaking_rate	No	Controls how fast the voice speaks. Default: `1`.
temperature	No	Controls variation and expressiveness in the generated speech. Default: `1`.
output_format	No	Output audio format: `MP3`, `LINEAR16`, `OGG_OPUS`, `FLAC`, or `WAV`.

How to Use

Enter your text — paste or type the content you want to convert into speech.
Choose a voice — select the voice that best fits your use case.
Adjust speaking rate and temperature (optional) — fine-tune pacing and expressiveness.
Choose output format — select MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.
Submit — generate and download the audio output.

Example Input

Welcome to our product demo. Today we will walk through the key features, explain how the workflow operates, and show how quickly you can integrate voice output into your application.

Pricing

Text Length	Cost
1–1000 chars	$0.035
1001–2000 chars	$0.070
2001–3000 chars	$0.105
3001–4000 chars	$0.140
4001–5000 chars	$0.175

Billing Rules

Pricing is based on the length of text.
Character count is rounded up to the next 1,000-character block.
Each additional started 1,000 characters adds $0.035.
voice_id, speaking_rate, temperature, and output_format do not affect pricing.

Best Use Cases

Realtime voice agents — Generate spoken responses for assistants, NPCs, and conversational interfaces.
Interactive applications — Add live voice output to games, education tools, and customer-facing apps.
Accessibility features — Turn written content into audio for more accessible user experiences.
Content narration — Create voiceovers for guides, product demos, and short-form content.
Prototype voice experiences — Quickly test different voices, pacing, and formats in development workflows.

Pro Tips

Keep input text clean and well-punctuated for more natural speech rhythm.
Split very long content into smaller sections when you want tighter pacing control.
Use speaking_rate to match the use case, such as slower for tutorials and faster for assistants.
Adjust temperature when you want more variation in delivery style.
Choose MP3 for broad compatibility, and use lossless formats like WAV or FLAC when audio quality matters more.
Reuse the same voice and settings across related clips for a more consistent user experience.

Notes

text is the only required field.
Supported output formats are MP3, LINEAR16, OGG_OPUS, FLAC, and WAV.
Pricing depends only on text length.
Audio format and voice settings do not change the price.

Other Inworld speech and voice generation models may be useful when you need different latency, quality, or voice configuration options.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "voice_id": "Dennis",
    "speaking_rate": 1,
    "temperature": 1,
    "output_format": "MP3",
    "enable_sync_mode": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
text	string	Yes	-	-	Text to synthesize into speech. Maximum input of 2,000 characters.
voice_id	string	No	Dennis	Alex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Yichen, Xiaoyin, Xinyi, Jing, Erik, Katrien, Lennart, Lore, Alain, Hélène, Mathieu, Étienne, Johanna, Josef, Gianni, Orietta, Asuka, Satoshi, Hyunwoo, Minji, Seojun, Yoona, Szymon, Wojciech, Heitor, Maitê, Diego, Lupita, Miguel, Rafael, Svetlana, Elena, Dmitry, Nikolai, Riya, Manoj, Yael, Oren, Nour, Omar	The voice to use for speech generation.
speaking_rate	number	No	1	0.5 ~ 1.5	The speed of speaking.
temperature	number	No	1	0.7 ~ 1.5	The temperature to use for the generation. A higher value means more randomness in the output.
output_format	string	No	MP3	MP3, LINEAR16, OGG_OPUS, FLAC, WAV	Output audio format.
enable_sync_mode	boolean	No	false	-	If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content.
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Inworld Inworld 1.5 Mini Text To Speech Tripo3d H3.1 Image To 3d

Inworld Realtime Tts 2

Playground

Features

Inworld Realtime TTS 2

Why Choose This?

Parameters

How to Use

Example Input

Pricing

Billing Rules

Best Use Cases

Pro Tips

Notes

Related Models

Authentication

API Endpoints

Submit Task & Query Result

Parameters

Task Submission Parameters

Request Parameters

Response Parameters

Result Request Parameters

Result Response Parameters