Inworld Realtime TTS 2

Inworld Realtime TTS 2 converts text into natural-sounding speech with low-latency generation and flexible voice controls. It supports multiple output audio formats and lets you adjust speaking rate and temperature for different delivery styles.

Why Choose This?

Low-latency text-to-speech Generate speech quickly for interactive apps, assistants, and real-time voice experiences.
Natural voice output Create smooth, human-like speech from plain text with selectable voices.
Flexible voice controls Adjust speaking rate and temperature to better match tone, pacing, and delivery style.
Multiple output formats Export audio in MP3, LINEAR16, OGG_OPUS, FLAC, or WAV depending on your workflow.
Production-ready API Access the model through a realtime-friendly API for apps, agents, games, and voice products.

Parameters

Parameter	Required	Description
text	Yes	Input text to convert into speech.
voice_id	No	Voice selection for the generated speech, such as `Julia`.
speaking_rate	No	Controls how fast the voice speaks. Default: `1`.
temperature	No	Controls variation and expressiveness in the generated speech. Default: `1`.
output_format	No	Output audio format: `MP3`, `LINEAR16`, `OGG_OPUS`, `FLAC`, or `WAV`.

How to Use

Enter your text — paste or type the content you want to convert into speech.
Choose a voice — select the voice that best fits your use case.
Adjust speaking rate and temperature (optional) — fine-tune pacing and expressiveness.
Choose output format — select MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.
Submit — generate and download the audio output.

Example Input

Welcome to our product demo. Today we will walk through the key features, explain how the workflow operates, and show how quickly you can integrate voice output into your application.

Pricing

Text Length	Cost
1–1000 chars	$0.035
1001–2000 chars	$0.070
2001–3000 chars	$0.105
3001–4000 chars	$0.140
4001–5000 chars	$0.175

Billing Rules

Pricing is based on the length of text.
Character count is rounded up to the next 1,000-character block.
Each additional started 1,000 characters adds $0.035.
voice_id, speaking_rate, temperature, and output_format do not affect pricing.

Best Use Cases

Realtime voice agents — Generate spoken responses for assistants, NPCs, and conversational interfaces.
Interactive applications — Add live voice output to games, education tools, and customer-facing apps.
Accessibility features — Turn written content into audio for more accessible user experiences.
Content narration — Create voiceovers for guides, product demos, and short-form content.
Prototype voice experiences — Quickly test different voices, pacing, and formats in development workflows.

Pro Tips

Keep input text clean and well-punctuated for more natural speech rhythm.
Split very long content into smaller sections when you want tighter pacing control.
Use speaking_rate to match the use case, such as slower for tutorials and faster for assistants.
Adjust temperature when you want more variation in delivery style.
Choose MP3 for broad compatibility, and use lossless formats like WAV or FLAC when audio quality matters more.
Reuse the same voice and settings across related clips for a more consistent user experience.

Notes

text is the only required field.
Supported output formats are MP3, LINEAR16, OGG_OPUS, FLAC, and WAV.
Pricing depends only on text length.
Audio format and voice settings do not change the price.

Related Models

Other Inworld speech and voice generation models may be useful when you need different latency, quality, or voice configuration options.

Realtime Tts 2 API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Realtime Tts 2 below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/inworld/realtime-tts-2" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "voice_id": "Dennis",
    "speaking_rate": 1,
    "temperature": 1,
    "output_format": "MP3",
    "enable_sync_mode": false
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("inworld/realtime-tts-2", {
        "voice_id": "Dennis",
        "speaking_rate": 1,
        "temperature": 1,
        "output_format": "MP3",
        "enable_sync_mode": false
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "inworld/realtime-tts-2",
    {
    "voice_id": "Dennis",
    "speaking_rate": 1,
    "temperature": 1,
    "output_format": "MP3",
    "enable_sync_mode": false
}
)

print(output["outputs"][0])  # → URL of the generated output

Realtime Tts 2 API — Frequently asked questions

What is the Realtime Tts 2 API?

Realtime Tts 2 is a Inworld model for audio generation, exposed as a REST API on WaveSpeedAI. Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Realtime Tts 2 API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/inworld/inworld-realtime-tts-2.

How much does Realtime Tts 2 cost per run?

Realtime Tts 2 starts at $0.035 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Realtime Tts 2 accept?

Key inputs: `enable_sync_mode`, `output_format`, `speaking_rate`, `temperature`, `text`, `voice_id`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/inworld/inworld-realtime-tts-2.

How do I get started with the Realtime Tts 2 API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Realtime Tts 2 outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Inworld). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README