Inworld Realtime Tts 2
Playground
Try it on WavespeedAI!Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
Inworld Realtime TTS 2
Inworld Realtime TTS 2 converts text into natural-sounding speech with low-latency generation and flexible voice controls. It supports multiple output audio formats and lets you adjust speaking rate and temperature for different delivery styles.
Why Choose This?
-
Low-latency text-to-speech Generate speech quickly for interactive apps, assistants, and real-time voice experiences.
-
Natural voice output Create smooth, human-like speech from plain text with selectable voices.
-
Flexible voice controls Adjust speaking rate and temperature to better match tone, pacing, and delivery style.
-
Multiple output formats Export audio in
MP3,LINEAR16,OGG_OPUS,FLAC, orWAVdepending on your workflow. -
Production-ready API Access the model through a realtime-friendly API for apps, agents, games, and voice products.
Parameters
| Parameter | Required | Description |
|---|---|---|
| text | Yes | Input text to convert into speech. |
| voice_id | No | Voice selection for the generated speech, such as Julia. |
| speaking_rate | No | Controls how fast the voice speaks. Default: 1. |
| temperature | No | Controls variation and expressiveness in the generated speech. Default: 1. |
| output_format | No | Output audio format: MP3, LINEAR16, OGG_OPUS, FLAC, or WAV. |
How to Use
- Enter your text — paste or type the content you want to convert into speech.
- Choose a voice — select the voice that best fits your use case.
- Adjust speaking rate and temperature (optional) — fine-tune pacing and expressiveness.
- Choose output format — select
MP3,LINEAR16,OGG_OPUS,FLAC, orWAV. - Submit — generate and download the audio output.
Example Input
Welcome to our product demo. Today we will walk through the key features, explain how the workflow operates, and show how quickly you can integrate voice output into your application.
Pricing
| Text Length | Cost |
|---|---|
| 1–1000 chars | $0.035 |
| 1001–2000 chars | $0.070 |
| 2001–3000 chars | $0.105 |
| 3001–4000 chars | $0.140 |
| 4001–5000 chars | $0.175 |
Billing Rules
- Pricing is based on the length of
text. - Character count is rounded up to the next
1,000-character block. - Each additional started
1,000characters adds $0.035. voice_id,speaking_rate,temperature, andoutput_formatdo not affect pricing.
Best Use Cases
- Realtime voice agents — Generate spoken responses for assistants, NPCs, and conversational interfaces.
- Interactive applications — Add live voice output to games, education tools, and customer-facing apps.
- Accessibility features — Turn written content into audio for more accessible user experiences.
- Content narration — Create voiceovers for guides, product demos, and short-form content.
- Prototype voice experiences — Quickly test different voices, pacing, and formats in development workflows.
Pro Tips
- Keep input text clean and well-punctuated for more natural speech rhythm.
- Split very long content into smaller sections when you want tighter pacing control.
- Use
speaking_rateto match the use case, such as slower for tutorials and faster for assistants. - Adjust
temperaturewhen you want more variation in delivery style. - Choose
MP3for broad compatibility, and use lossless formats likeWAVorFLACwhen audio quality matters more. - Reuse the same voice and settings across related clips for a more consistent user experience.
Notes
textis the only required field.- Supported output formats are
MP3,LINEAR16,OGG_OPUS,FLAC, andWAV. - Pricing depends only on text length.
- Audio format and voice settings do not change the price.
Related Models
- Other Inworld speech and voice generation models may be useful when you need different latency, quality, or voice configuration options.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"voice_id": "Dennis",
"speaking_rate": 1,
"temperature": 1,
"output_format": "MP3",
"enable_sync_mode": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| text | string | Yes | - | - | Text to synthesize into speech. Maximum input of 2,000 characters. |
| voice_id | string | No | Dennis | Alex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Yichen, Xiaoyin, Xinyi, Jing, Erik, Katrien, Lennart, Lore, Alain, Hélène, Mathieu, Étienne, Johanna, Josef, Gianni, Orietta, Asuka, Satoshi, Hyunwoo, Minji, Seojun, Yoona, Szymon, Wojciech, Heitor, Maitê, Diego, Lupita, Miguel, Rafael, Svetlana, Elena, Dmitry, Nikolai, Riya, Manoj, Yael, Oren, Nour, Omar | The voice to use for speech generation. |
| speaking_rate | number | No | 1 | 0.5 ~ 1.5 | The speed of speaking. |
| temperature | number | No | 1 | 0.7 ~ 1.5 | The temperature to use for the generation. A higher value means more randomness in the output. |
| output_format | string | No | MP3 | MP3, LINEAR16, OGG_OPUS, FLAC, WAV | Output audio format. |
| enable_sync_mode | boolean | No | false | - | If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content. |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |