Kling 2.5 Turbo Pro is a Text-to-Video model that delivers cinematic visuals, fluid motion, and precise prompt-to-motion responsiveness. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Idle
$0.35per run·~28 / $10
A world-weary private investigator leans against a rain-streaked phone booth on a foggy 1940s New York street, steam rising from a manhole cover. A fedora casts a deep shadow over his eyes. Holding the classic receiver, he speaks in a low, gravelly voice: "The recording was pure static, a dead end. Then I ran it through WaveSpeedAI. The new Wan2.5 algorithm filtered the noise... I heard everything. The whisper, the threat, the whole rotten deal." High-contrast black and white, dramatic chiaroscuro lighting, deep focus, 35mm film grain, moody and atmospheric.
A fearless female snowboarder carves a fresh track down a steep, untouched Alaskan mountain peak at sunrise. She wears a vibrant, multi-colored thermal suit and reflective goggles that mirror the pink and orange sky. Mid-air during a jump, she twists toward a helmet-mounted camera and shouts with pure adrenaline: “The sound of the powder, the rush of the wind—it’s identical! WaveSpeedAI nailed it with Wan2.5. You have to hear this!” Crisp, high-altitude light, dynamic FPV (first-person view) shot, slow-motion effect capturing flying snow particles, wide-angle lens, hyper-realistic detail.
An iconic pop superstar poses dramatically on the steps of a grand gala, surrounded by the blinding flashes of paparazzi cameras. She wears an avant-garde gown made of iridescent, crystalline structures that seem to defy gravity. She holds a tiny, jewel-encrusted microphone. With a confident smirk, she tells a reporter: “This year’s theme? Auditory Couture. My gown is literally singing, and the texture of the sound is all Wan2.5 from WaveSpeedAI. It’s the only accessory I need.” Stroboscopic flashbulb effect, glamorous soft lighting, bokeh background of lights and crowds, low-angle full-body shot, Vogue editorial style.
A focused male field recordist in his late 30s crouches in a dense, misty redwood forest at dawn. He wears rugged outdoor gear and monitors the input on his Sound Devices recorder, a look of serene concentration on his face. He whispers reverently into his lapel mic for a documentary voice-over: “For days, all I captured was the wind. But the new Wan2.5 noise-reduction on the WaveSpeedAI firmware is a game-changer. Listen... underneath it all, the sound of a salamander moving through damp leaves. The texture is so crisp, you can feel the forest floor.” BBC Earth documentary style, telephoto lens, natural diffused morning light, ultra-realistic 8K, cinematic and patient.
In a sun-drenched, forgotten library filled with floating dust motes, a clever young scholar with round spectacles slides down a rolling ladder, clutching a glowing ancient tome. She wears a tweed academic robe over a simple linen dress. With breathless excitement, she looks directly at the camera and whispers: “It wasn’t magic, it was frequency! Wan2.5 on WaveSpeedAI can reproduce the Resonance Codex perfectly. Listen!” Warm, golden hour lighting, high contrast, dust particles visible in light rays, rack focus from the book to her face, shallow depth of field, vintage film grain effect.
Working late in a busy newsroom, a determined investigative journalist leans forward, pointing at a specific segment of a soundwave on her computer. She turns to her editor, who is standing over her shoulder, and speaks with urgent conviction: “The original file was useless—too much background noise from the cafe. But I ran it through WaveSpeedAI's forensic tool. The Wan2.5 algorithm isolated the whisper right here. The vocal texture is undeniable. We got him.” Realistic office lighting, handheld documentary feel (like "Spotlight"), rack focus from the screen to her eyes, natural color grading, tense and dramatic.
A determined college student sits in a quiet university library, comparing two soundwaves on his laptop – one from a native French speaker, one of his own recording. He listens intently through his earbuds, and a subtle look of breakthrough understanding crosses his face. He whispers to himself: "I could never hear the difference before. But the Wan2.5 analysis on WaveSpeedAI visually maps the vocal texture. That subtle vibration on the 'r'... I finally see what I'm doing wrong." Clean, academic aesthetic, over-the-shoulder shot showing the screen, cool neutral lighting, sharp focus, realistic educational scenario.
In a sound-proofed home studio, a young ASMR artist closes her eyes in deep concentration, gently brushing a soft makeup brush against a high-fidelity binaural microphone. The room is warmly lit, creating a cozy and intimate atmosphere. She whispers softly into the mic: "You can hear the bristles, right? But now, listen to the audio processed with Wan2.5 from WaveSpeedAI... you can hear the texture of each individual fiber. The detail is unbelievable. Pure tingles." Extreme close-up shot on the microphone and brush, very shallow depth of field, warm, soft lighting, calm and immersive mood, cinematic 4K.
A bright medical student in a university simulation lab listens to a patient's recorded heartbeat through a digital stethoscope connected to a tablet. Her expression shifts from confusion to a sudden 'aha' moment. She pauses the playback and points to the screen, explaining to a classmate: "I couldn't hear the murmur he mentioned. But when the audio is visualized by WaveSpeedAI, the Wan2.5 analysis highlights a faint textural anomaly right after the S2 sound. Now that I see it, I can't un-hear it." Clean, bright, high-key lighting, shallow depth of field focusing on the tablet's screen, sterile and professional medical aesthetic.
Drone shot slowly flying over a dramatic coastline, turquoise waves crashing against black cliffs, creating massive white spray. Overcast soft lighting, National Geographic documentary style.
Photorealistic majestic stag in a misty redwood forest, sunbeams piercing through the canopy creating dramatic light rays, dust motes visible in the air. Hyperrealistic, cinematic, 4K, stable shot.
Sahara desert dunes during the golden hour of sunset, warm orange and red tones, wind blowing fine sand off the crest of a dune. Telephoto lens, slight heat haze effect.
Kling 2.5 Turbo Pro is an advanced text-to-video model that produces ultra-smooth motion, cinematic visuals, and accurate prompt adherence.
Its improved dynamic processing and text-to-motion control allow for seamless transitions while maintaining style fidelity across various looks.
Enhanced multi-step instruction understanding A new text-and-timing controller processes multi-step prompts to transform static inputs into coherent, controllable narrative scenes.
High-motion quality and stability Better training and data balance create realistic dynamics, enabling quick and complex movements with fewer artifacts like jitter, tearing, or frame drops.
Faster inference Optimized pipelines reduce end-to-end delay, providing faster delivery of high-quality results without compromising visual fidelity.
Consistent style Enhanced style conditioning preserves the reference look (palette, lighting, brushwork, mood), ensuring frames stay consistent - even during dynamic scenes.
| Duration | Price |
|---|---|
| 5s | $0.35 |
| 10s | $0.70 |
Write the prompt – Specify subject, scene, actions, camera movement, and style keywords; include multi-step/causal logic if needed.
Choose aspect – Match output to your channel and quality targets.
Set duration – Help models understand how long of the result.
Set guidance_scale – Controls how strongly the model follows your prompt. The higher the value, the less creative freedom the model has.
Generate – Leverage accelerated inference to get a first pass quickly.
Review & iterate – Refine timing, angles, or style strength and re-render for final delivery (Set the seed).
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.5-turbo-pro/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Kling v2.5 Turbo Pro Text To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.5-turbo-pro/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"aspect_ratio": "16:9",
"duration": 5,
"guidance_scale": 0.5
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("kwaivgi/kling-v2.5-turbo-pro/text-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"aspect_ratio": "16:9",
"duration": 5,
"guidance_scale": 0.5
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"kwaivgi/kling-v2.5-turbo-pro/text-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"aspect_ratio": "16:9",
"duration": 5,
"guidance_scale": 0.5
}
)
print(output["outputs"][0]) # → URL of the generated outputKling v2.5 Turbo Pro Text To Video is a Kuaishou model for video generation, exposed as a REST API on WaveSpeedAI. Kling 2.5 Turbo Pro is a Text-to-Video model that delivers cinematic visuals, fluid motion, and precise prompt-to-motion responsiveness. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.5-turbo-pro-text-to-video.
Kling v2.5 Turbo Pro Text To Video starts at $0.35 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `aspect_ratio`, `duration`, `guidance_scale`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.5-turbo-pro-text-to-video.
Average end-to-end generation time on WaveSpeedAI is around 201 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Kuaishou). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.