Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Idle
$0.035per run·~28 / $1
Inworld Realtime TTS 2 converts text into natural-sounding speech with low-latency generation and flexible voice controls. It supports multiple output audio formats and lets you adjust speaking rate and temperature for different delivery styles.
Low-latency text-to-speech Generate speech quickly for interactive apps, assistants, and real-time voice experiences.
Natural voice output Create smooth, human-like speech from plain text with selectable voices.
Flexible voice controls Adjust speaking rate and temperature to better match tone, pacing, and delivery style.
Multiple output formats
Export audio in MP3, LINEAR16, OGG_OPUS, FLAC, or WAV depending on your workflow.
Production-ready API Access the model through a realtime-friendly API for apps, agents, games, and voice products.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | Input text to convert into speech. |
| voice_id | No | Voice selection for the generated speech, such as Julia. |
| speaking_rate | No | Controls how fast the voice speaks. Default: 1. |
| temperature | No | Controls variation and expressiveness in the generated speech. Default: 1. |
| output_format | No | Output audio format: MP3, LINEAR16, OGG_OPUS, FLAC, or WAV. |
MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.Welcome to our product demo. Today we will walk through the key features, explain how the workflow operates, and show how quickly you can integrate voice output into your application.
| Text Length | Cost |
|---|---|
| 1–1000 chars | $0.035 |
| 1001–2000 chars | $0.070 |
| 2001–3000 chars | $0.105 |
| 3001–4000 chars | $0.140 |
| 4001–5000 chars | $0.175 |
text.1,000-character block.1,000 characters adds $0.035.voice_id, speaking_rate, temperature, and output_format do not affect pricing.speaking_rate to match the use case, such as slower for tutorials and faster for assistants.temperature when you want more variation in delivery style.MP3 for broad compatibility, and use lossless formats like WAV or FLAC when audio quality matters more.text is the only required field.MP3, LINEAR16, OGG_OPUS, FLAC, and WAV.Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Realtime Tts 2 below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"voice_id": "Dennis",
"speaking_rate": 1,
"temperature": 1,
"output_format": "MP3",
"enable_sync_mode": false
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("inworld/realtime-tts-2", {
"voice_id": "Dennis",
"speaking_rate": 1,
"temperature": 1,
"output_format": "MP3",
"enable_sync_mode": false
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"inworld/realtime-tts-2",
{
"voice_id": "Dennis",
"speaking_rate": 1,
"temperature": 1,
"output_format": "MP3",
"enable_sync_mode": false
}
)
print(output["outputs"][0]) # → URL of the generated outputRealtime Tts 2 is a Inworld model for audio generation, exposed as a REST API on WaveSpeedAI. Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/inworld/inworld-realtime-tts-2.
Realtime Tts 2 starts at $0.035 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `enable_sync_mode`, `output_format`, `speaking_rate`, `temperature`, `text`, `voice_id`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/inworld/inworld-realtime-tts-2.
Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.
Commercial usage rights depend on the model's license, set by its provider (Inworld). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.