Vidu Text-to-Video Q1 converts text prompts into high-quality videos with exceptional visual fidelity and motion diversity. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Idle
$0.4per run·~25 / $10
On a summer afternoon, a girl in a white dress sits on a slow-moving, old-fashioned train traveling through the countryside. Sunlight streams through the windows, creating dappled light and shadow on the wooden floor, with tiny dust motes floating in the air. She rests her cheek on her hand, gazing with a hint of melancholy at the lush green rice paddies rushing by. Studio Ghibli animation style, warm and healing.
Cinematic slow motion. On a windswept cliff edge, a young sorcerer extends his hands. Streams of liquid fire and swirling ice crystals manifest from his palms, intertwining to form a spiraling helix of elemental energy that corkscrews upwards around him. His robes and hair whip violently in the powerful arcane storm.
A giant clockwork angel descends through the shattered stained-glass window of a gothic cathedral. He is made of brass, gears, and platinum, and his enormous wings, composed of countless metallic feathers, unfold gracefully. A divine golden light emanates from his core, its beams cutting through the dust motes inside the cathedral.
A mysterious messenger wearing a fox mask (kitsune mask) moves with agility through an ancient forest dotted with giant torii gates and glowing stone lanterns. He is followed by several translucent, softly glowing spirit foxes. As he leaps over a small stream, his steps create ripples, and sakura petals float in the surrounding air.
Cinematic close-up. A pair of wrinkled hands carefully hold an old, slightly yellowed black-and-white photograph. The photo shows a young woman smiling by the sea. The camera slowly pushes in, focusing on the woman's eyes in the photo, which hold both youthful vitality and a subtle, almost imperceptible melancholy. The background is a soft, blurred, warm glow of a sunset.
Macro shot of a tiny, fluffy hamster holding a single, giant apple with both paws. It nibbles on it with great effort, its cheeks puffed out contentedly. Clean, bright background.
A cat is curled up and sleeping peacefully in a warm patch of sunlight on a wooden floor. Its chest gently rises and falls with each breath. Dust motes dance in the sunbeam. Serene and peaceful atmosphere.
Slow-motion close-up of a single, clear raindrop sliding down a vibrant green leaf. It finally drips into a puddle, creating a soft, gentle ripple. Clean, minimalist, and zen-like.
Vidu Q1 Text-to-Video is a high-end video generation model built on Shengshu Technology’s Vidu Q-series architecture. It transforms natural language prompts into cinematic 720p videos with exceptional realism, diverse motion, and consistent visual fidelity — optimized for creative professionals and production use.
High-Fidelity Generation Produces visually rich, detailed videos with natural lighting, textures, and depth.
Motion Diversity Captures a wide range of subject and camera motion — from subtle gestures to complex dynamic scenes.
Temporal Consistency Ensures frame-to-frame coherence and smooth motion transitions without flicker or distortion.
Prompt-Driven Storytelling Understands complex prompts, generating coherent narrative flow and visual alignment with text.
Cinematic Quality (720p) Designed for high-quality visual outputs suitable for editing, marketing, and storytelling.
prompt — Describe your desired scene, action, or atmosphere.
movement_amplitude — Control the motion intensity:
auto – Adaptive movement based on scene content.
small – Subtle or static scenes.
medium – Balanced motion.
large – Dramatic or action-focused motion.
style - choose general or anime.
duration — 5 seconds per generation.
seed — Optional; set a fixed number for reproducible results.
| Resolution | Duration | Cost per Clip |
|---|---|---|
| 720p | 5s | $0.40 |
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/vidu/text-to-video-q1 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Text To Video Q1 below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/vidu/text-to-video-q1" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"movement_amplitude": "auto",
"style": "general",
"seed": 0
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("vidu/text-to-video-q1", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"movement_amplitude": "auto",
"style": "general",
"seed": 0
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"vidu/text-to-video-q1",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"movement_amplitude": "auto",
"style": "general",
"seed": 0
}
)
print(output["outputs"][0]) # → URL of the generated outputText To Video Q1 is a Vidu model for video generation, exposed as a REST API on WaveSpeedAI. Vidu Text-to-Video Q1 converts text prompts into high-quality videos with exceptional visual fidelity and motion diversity. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/vidu/vidu-text-to-video-q1.
Text To Video Q1 starts at $0.40 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `seed`, `movement_amplitude`, `style`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/vidu/vidu-text-to-video-q1.
Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.
Commercial usage rights depend on the model's license, set by its provider (Vidu). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.