Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Home/Explore/Kling O3 Models/kwaivgi/kling-video-o3-pro/text-to-video

Kling Omni Video O3 Text-To-Video

kwaivgi/kling-video-o3-pro/text-to-video

Kling Omni Video O3 is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

text-to-video
Input
Whether to generate audio for the video.

Idle

Your request will cost $0.56 per run.

For $10 you can run this model approximately 17 times.

One more thing:

ExamplesView all

README

Kling Video O3 Pro Text-to-Video

Kling Video O3 Pro is Kuaishou's flagship text-to-video model, delivering cinematic-quality video generation from natural language prompts. It combines physics-aware motion simulation, high temporal consistency, and optional synchronized audio generation to produce professional-grade video content from detailed text descriptions.

Why Choose This?

  • Cinematic-quality output Produces richly detailed, visually coherent video with professional-grade lighting, composition, and motion rendering.

  • Physics-aware motion Understands real-world dynamics — fluid movement, fabric, hair, and object interactions behave naturally and believably.

  • Synchronized audio generation Enable the sound option to generate matching ambient audio, sound effects, and atmosphere alongside your video.

  • Multi-prompt support Chain multiple prompt segments to guide scene transitions and narrative flow within a single generation.

  • Element list control Reference specific visual elements to maintain consistency in characters, objects, or stylistic details across the clip.

  • Flexible aspect ratios Supports multiple orientations including 16:9, 9:16, and 1:1 for social, cinematic, and square formats.

Parameters

ParameterRequiredDescription
promptYesText description of the scene, action, camera style, lighting, and mood.
aspect_ratioNoOutput aspect ratio. Options: 16:9, 9:16, 1:1.
durationNoClip length in seconds. Options: 5, 10.
soundNoWhether to generate synchronized audio for the video. Default: off.
shot_typeNoEditing mode: intelligent (default, auto-determines scope) or customize.
multi_promptNoAdditional prompt segments to guide scene progression and transitions.
element_listNoList of specific visual elements to maintain across the generation.

How to Use

  1. Write your prompt — describe the scene, characters, camera movement, lighting style, and mood in detail. Use the Prompt Enhancer for better results.
  2. Select aspect ratio — choose 16:9 for cinematic/landscape, 9:16 for portrait/social, or 1:1 for square formats.
  3. Set duration — choose 5 or 10 seconds based on your scene length.
  4. Enable sound (optional) — check the sound option to generate matching audio alongside the video.
  5. Select shot_type (optional) — use intelligent for automatic scope, or customize for manual control.
  6. Add multi-prompt segments (optional) — click Add Item to guide scene transitions with additional prompts.
  7. Add element list items (optional) — specify visual elements to maintain consistency throughout the clip.
  8. Submit — generate, preview, and download your video.

Pricing

DurationWithout SoundWith Sound
5s$0.56$0.70
10s$1.12$1.40

Billing Rules

  • Base rate: $0.112 per second
  • Sound surcharge: +25% when sound is enabled
  • Duration options: 5 or 10 seconds
  • Billing is based on the selected duration and sound setting

Best Use Cases

  • Cinematic Storytelling — Render rich, narrative-driven scenes from detailed prompts with broadcast-quality output.
  • Commercial & Brand Video — Produce premium marketing footage without a film crew.
  • Social Media Content — Generate portrait or square clips with synchronized audio for maximum engagement.
  • Concept Visualization — Bring creative directions, moods, and visual concepts to life quickly for client review.
  • Music & Audio-Visual Projects — Use sound generation for immersive, atmosphere-driven clips.

Pro Tips

  • The more specific your prompt, the better — include camera angle, lighting era, character behavior, and atmosphere.
  • Use multi_prompt to create smooth narrative progressions across a single clip.
  • Enable sound when generating scenes with ambient environments, crowds, or action for a more immersive result.
  • Start with a 5-second generation without sound to validate your prompt before committing to a longer, audio-enabled run.
  • Use element_list to lock in key visual details that must remain consistent throughout the video.

Notes

  • Only prompt is a required field; all other parameters are optional.
  • Sound generation adds 25% to the base cost.
  • Please follow Kuaishou's content usage policies when crafting prompts.

Related Models