Kling Video O3 Pro Reference-to-Video
Kling Video O3 Pro Reference-to-Video generates premium video from reference images with optional video guidance. Upload reference images to establish character identity and appearance, optionally provide a reference video for motion guidance, and describe the scene — the model produces top-tier cinematic video with identity consistency.
Why Choose This?
-
O3 Pro quality
The highest visual fidelity and motion realism in the Kling family.
-
Multi-reference images
Upload up to 7 reference images (or up to 4 with a reference video).
-
Video-guided generation
Optional reference video for motion and scene guidance.
-
Keep original sound
Preserve the audio from the reference video in the output.
-
Sound generation
Optional AI-generated sound effects when no reference video is provided.
-
Multi-prompt and element list support
Chain prompt segments for scene transitions and lock in specific visual elements for consistency throughout the clip.
Parameters
| Parameter | Required | Description |
|---|
| prompt | Yes | Text description of the video scene, characters, and motion. |
| video | No | Reference video for motion guidance. |
| images | No | Reference images: up to 4 with video, up to 7 without. |
| keep_original_sound | No | Keep audio from the reference video. Default: enabled. |
| sound | No | Generate AI audio (only when no reference video). Default: disabled. |
| aspect_ratio | No | Output ratio: 16:9 (default), 9:16, 1:1. |
| duration | No | Video length in seconds. Range: 3–15. Default: 5. |
| shot_type | No | Editing mode: intelligent (default, auto-determines scope) or customize. |
| multi_prompt | No | Additional prompt segments to guide scene transitions and progressions. |
| element_list | No | List of visual elements to maintain consistency throughout the clip. |
How to Use
- Write your prompt — describe the scene, characters, and action. Use the Prompt Enhancer for better results.
- Upload reference video (optional) — provide a video for motion guidance.
- Upload reference images — add character or scene references.
- Configure audio — keep original sound from the reference video, or enable AI sound generation.
- Select aspect ratio — match your target platform.
- Set duration — choose any length from 3 to 15 seconds.
- Select shot_type (optional) — use intelligent for automatic scope, or customize for manual control.
- Add multi-prompt segments (optional) — click Add Item to guide scene transitions.
- Add element list items (optional) — see Notes below for how to use elements effectively.
- Run — submit and download your video.
Pricing
| Duration | No Video, No Sound | No Video, With Sound | With Reference Video |
|---|
| 3s | $0.336 | $0.403 | $0.504 |
| 5s | $0.560 | $0.672 | $0.840 |
| 10s | $1.120 | $1.344 | $1.680 |
| 15s | $1.680 | $2.016 | $2.520 |
Billing Rules
- Base rate: $0.56 per 5 seconds ($0.112 per second)
- With reference video: ×1.5 multiplier (overrides sound setting)
- With AI sound (no video): ×1.2 multiplier
- Duration range: 3–15 seconds
Best Use Cases
- Character Consistency — Generate videos with identity-consistent characters from reference images.
- Video Remixing — Use a reference video for motion guidance with new characters or elements.
- Marketing & Ads — Create promotional videos featuring specific people or products.
- Storytelling — Produce narrative scenes with consistent character appearance across clips.
- Long-Form Scenes — Up to 15 seconds for extended scene development.
Pro Tips
- Use multiple reference images from different angles for better identity preservation.
- When using a reference video, the image limit is 4; without a video, you can use up to 7.
- Enable keep_original_sound to preserve audio from your reference video.
- Sound generation is only available when no reference video is provided.
- Use shorter durations (3–5s) for testing, longer (10–15s) for final production.
- Match aspect ratio to your platform: 16:9 for YouTube, 9:16 for TikTok and Reels, 1:1 for Instagram.
- Use multi_prompt to build smooth narrative progressions across the clip.
Notes
- Only prompt is required; all other parameters are optional.
- Duration range: minimum 3 seconds, maximum 15 seconds.
- Reference images limit: up to 4 with video, up to 7 without.
- When a reference video is provided, sound generation is replaced by keep_original_sound.
- Using element_list: First use Kling Elements to generate your element and note its name and ID. Then simply write the element name naturally in your prompt, and enter the corresponding element ID in the element_list field. No special characters or syntax required.
- Ensure uploaded image and video URLs are publicly accessible.
Related Models
- Kling Video O3 Std Reference-to-Video — Standard tier reference-to-video at budget-friendly pricing.
- Kling Video O3 Pro Image-to-Video — O3 Pro quality single image to video.
- Kling Video O3 Pro Text-to-Video — O3 Pro quality text-to-video.
- Kling Video O3 Pro Video Edit — Edit existing videos with natural-language instructions.