Kling Video O3 Std Reference-to-Video
Kling Video O3 Standard Reference-to-Video generates new videos guided by reference images and an optional reference video, maintaining consistent characters, styles, and scenes. Describe a scenario involving the people or elements in your reference images — the model brings them together in a coherent, natural video. Supports flexible duration, aspect ratio control, and optional sound generation.
Why Choose This?
-
Character-consistent generation
Upload reference images of specific people or elements, and the model preserves their identity throughout the generated video.
-
Multi-reference support
Provide multiple reference images to combine different characters, styles, or elements in one scene.
-
Optional reference video
Supply a reference video for motion guidance, style transfer, or scene continuity.
-
Sound options
Keep original audio from a reference video, or generate new synchronized sound effects.
-
Multi-prompt support
Chain multiple prompt segments to guide scene transitions and narrative flow within a single generation.
-
Flexible output
Multiple aspect ratios (16:9, 9:16, 1:1, etc.) and duration from 3 to 15 seconds.
Parameters
| Parameter | Required | Description |
|---|
| prompt | Yes | Text description of the desired scene and action. |
| video | No | Reference video for motion or style guidance. |
| images | No | Reference images of characters, elements, or styles. |
| keep_original_sound | No | Keep the original sound from the reference video. Default: enabled. |
| sound | No | Generate synchronized audio for the video. Default: disabled. |
| aspect_ratio | No | Video aspect ratio. Default: 16:9. |
| duration | No | Video length in seconds. Range: 3–15. Default: 5. |
| shot_type | No | Editing mode: intelligent (default, auto-determines scope) or customize. |
| multi_prompt | No | Additional prompt segments to guide scene transitions and progressions. |
How to Use
- Write your prompt — describe the scene, referencing characters or elements by position (e.g., "The man in Figure 2 is walking with the woman in Figure 1 in the park.").
- Add reference images — upload images of the characters, objects, or styles you want in the video.
- Add reference video (optional) — provide a video for motion or style guidance.
- Choose aspect ratio — select the format that fits your platform.
- Set duration — choose any length from 3 to 15 seconds.
- Set sound preference — keep original audio from the reference video, or enable generated sound.
- Select shot_type (optional) — use intelligent for automatic scope, or customize for manual control.
- Add multi-prompt segments (optional) — click Add Item to guide scene transitions.
- Run — submit and download your video.
Pricing
| Duration | No Video, No Sound | No Video, With Sound | With Reference Video |
|---|
| 3s | $0.252 | $0.336 | $0.378 |
| 5s | $0.420 | $0.560 | $0.630 |
| 10s | $0.840 | $1.120 | $1.260 |
| 15s | $1.260 | $1.680 | $1.890 |
Billing Rules
- Base rate: $0.42 per 5 seconds ($0.084 per second)
- Sound surcharge: +33% when sound is enabled (no reference video)
- Reference video surcharge: ×1.5 when a reference video is provided (overrides sound multiplier)
- Duration range: 3–15 seconds
Best Use Cases
- Character-Driven Storytelling — Create scenes starring specific characters from your reference images.
- Social Media Content — Produce personalized short-form videos with consistent character identity.
- Marketing & Ads — Generate brand ambassador or spokesperson videos from still photos.
- Creative Concepting — Combine multiple characters or elements into new scenarios for rapid ideation.
- Style Transfer — Use a reference video to guide the motion and visual style of new content.
Pro Tips
- Reference images with clear faces and distinct features produce the best character consistency.
- Use "Figure 1", "Figure 2" etc. in your prompt to refer to specific reference images in order.
- Use shorter durations (3–5s) for testing character consistency before generating longer clips.
- Match aspect ratio to your target platform: 16:9 for YouTube, 9:16 for TikTok and Reels.
- Use multi_prompt to build smooth narrative progressions across the clip.
Notes
- Prompt is the only required field; reference images are strongly recommended for best results.
- Duration range: minimum 3 seconds, maximum 15 seconds.
- When a reference video is provided, the cost is 1.5× the base rate regardless of sound settings.
- Ensure uploaded URLs are publicly accessible.
Related Models
- Kling Video O3 Pro Reference-to-Video — Maximum quality reference-to-video with O3 Pro tier.
- Kling Video O3 Std Image-to-Video — Animate a single image into video at Standard pricing.
- Kling Video O3 Std Text-to-Video — Generate videos from text prompts at Standard pricing.
- Kling Video O3 Std Video Edit — Edit existing videos with natural-language instructions.