Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Home/Explore/Kling O3 Models/kwaivgi/kling-video-o3-std/reference-to-video

Kling Omni Video O3 Standard

kwaivgi/kling-video-o3-std/reference-to-video

Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

image-to-video
Input

Drag & drop or click to upload

Drag & drop or click to upload

preview

Drag & drop or click to upload

preview
Whether to keep the original sound from the reference video.
Whether to generate audio for the video.

Idle

Your request will cost $0.42 per run.

For $10 you can run this model approximately 23 times.

One more thing:

ExamplesView all

README

Kling Video O3 Std Reference-to-Video

Kling Video O3 Standard Reference-to-Video generates new videos guided by reference images and an optional reference video, maintaining consistent characters, styles, and scenes. Describe a scenario involving the people or elements in your reference images — the model brings them together in a coherent, natural video. Supports flexible duration, aspect ratio control, and optional sound generation.

Why Choose This?

  • Character-consistent generation Upload reference images of specific people or elements, and the model preserves their identity throughout the generated video.

  • Multi-reference support Provide multiple reference images to combine different characters, styles, or elements in one scene.

  • Optional reference video Supply a reference video for motion guidance, style transfer, or scene continuity.

  • Sound options Keep original audio from a reference video, or generate new synchronized sound effects.

  • Multi-prompt support Chain multiple prompt segments to guide scene transitions and narrative flow within a single generation.

  • Flexible output Multiple aspect ratios (16:9, 9:16, 1:1, etc.) and duration from 3 to 15 seconds.

Parameters

ParameterRequiredDescription
promptYesText description of the desired scene and action.
videoNoReference video for motion or style guidance.
imagesNoReference images of characters, elements, or styles.
keep_original_soundNoKeep the original sound from the reference video. Default: enabled.
soundNoGenerate synchronized audio for the video. Default: disabled.
aspect_ratioNoVideo aspect ratio. Default: 16:9.
durationNoVideo length in seconds. Range: 3–15. Default: 5.
shot_typeNoEditing mode: intelligent (default, auto-determines scope) or customize.
multi_promptNoAdditional prompt segments to guide scene transitions and progressions.

How to Use

  1. Write your prompt — describe the scene, referencing characters or elements by position (e.g., "The man in Figure 2 is walking with the woman in Figure 1 in the park.").
  2. Add reference images — upload images of the characters, objects, or styles you want in the video.
  3. Add reference video (optional) — provide a video for motion or style guidance.
  4. Choose aspect ratio — select the format that fits your platform.
  5. Set duration — choose any length from 3 to 15 seconds.
  6. Set sound preference — keep original audio from the reference video, or enable generated sound.
  7. Select shot_type (optional) — use intelligent for automatic scope, or customize for manual control.
  8. Add multi-prompt segments (optional) — click Add Item to guide scene transitions.
  9. Run — submit and download your video.

Pricing

DurationNo Video, No SoundNo Video, With SoundWith Reference Video
3s$0.252$0.336$0.378
5s$0.420$0.560$0.630
10s$0.840$1.120$1.260
15s$1.260$1.680$1.890

Billing Rules

  • Base rate: $0.42 per 5 seconds ($0.084 per second)
  • Sound surcharge: +33% when sound is enabled (no reference video)
  • Reference video surcharge: ×1.5 when a reference video is provided (overrides sound multiplier)
  • Duration range: 3–15 seconds

Best Use Cases

  • Character-Driven Storytelling — Create scenes starring specific characters from your reference images.
  • Social Media Content — Produce personalized short-form videos with consistent character identity.
  • Marketing & Ads — Generate brand ambassador or spokesperson videos from still photos.
  • Creative Concepting — Combine multiple characters or elements into new scenarios for rapid ideation.
  • Style Transfer — Use a reference video to guide the motion and visual style of new content.

Pro Tips

  • Reference images with clear faces and distinct features produce the best character consistency.
  • Use "Figure 1", "Figure 2" etc. in your prompt to refer to specific reference images in order.
  • Use shorter durations (3–5s) for testing character consistency before generating longer clips.
  • Match aspect ratio to your target platform: 16:9 for YouTube, 9:16 for TikTok and Reels.
  • Use multi_prompt to build smooth narrative progressions across the clip.

Notes

  • Prompt is the only required field; reference images are strongly recommended for best results.
  • Duration range: minimum 3 seconds, maximum 15 seconds.
  • When a reference video is provided, the cost is 1.5× the base rate regardless of sound settings.
  • Ensure uploaded URLs are publicly accessible.

Related Models