Kling Video O3 Std Reference to Video | Fast Image-to-Video API

Kling Video O3 Std Reference-to-Video

Kling Video O3 Standard Reference-to-Video generates new videos guided by reference images and an optional reference video, maintaining consistent characters, styles, and scenes. Describe a scenario involving the people or elements in your reference images — the model brings them together in a coherent, natural video. Supports flexible duration, aspect ratio control, and optional sound generation.

Why Choose This?

Character-consistent generation Upload reference images of specific people or elements, and the model preserves their identity throughout the generated video.
Multi-reference support Provide multiple reference images to combine different characters, styles, or elements in one scene.
Optional reference video Supply a reference video for motion guidance, style transfer, or scene continuity.
Sound options Keep original audio from a reference video, or generate new synchronized sound effects.
Multi-prompt support Chain multiple prompt segments to guide scene transitions and narrative flow within a single generation.
Flexible output Multiple aspect ratios (16:9, 9:16, 1:1, etc.) and duration from 3 to 15 seconds.

Parameters

Parameter	Required	Description
prompt	Yes	Text description of the desired scene and action.
video	No	Reference video for motion or style guidance.
images	No	Reference images of characters, elements, or styles.
keep_original_sound	No	Keep the original sound from the reference video. Default: enabled.
sound	No	Generate synchronized audio for the video. Default: disabled.
aspect_ratio	No	Video aspect ratio. Default: 16:9.
duration	No	Video length in seconds. Range: 3–15. Default: 5.
shot_type	No	Editing mode: intelligent (default, auto-determines scope) or customize.
multi_prompt	No	Additional prompt segments to guide scene transitions and progressions.

How to Use

Write your prompt — describe the scene, referencing characters or elements by position (e.g., "The man in Figure 2 is walking with the woman in Figure 1 in the park.").
Add reference images — upload images of the characters, objects, or styles you want in the video.
Add reference video (optional) — provide a video for motion or style guidance.
Choose aspect ratio — select the format that fits your platform.
Set duration — choose any length from 3 to 15 seconds.
Set sound preference — keep original audio from the reference video, or enable generated sound.
Select shot_type (optional) — use intelligent for automatic scope, or customize for manual control.
Add multi-prompt segments (optional) — click Add Item to guide scene transitions.
Run — submit and download your video.

Pricing

Duration	No Video, No Sound	No Video, With Sound	With Reference Video
3s	$0.252	$0.336	$0.378
5s	$0.420	$0.560	$0.630
10s	$0.840	$1.120	$1.260
15s	$1.260	$1.680	$1.890

Billing Rules

Base rate: $0.42 per 5 seconds ($0.084 per second)
Sound surcharge: +33% when sound is enabled (no reference video)
Reference video surcharge: ×1.5 when a reference video is provided (overrides sound multiplier)
Duration range: 3–15 seconds

Best Use Cases

Character-Driven Storytelling — Create scenes starring specific characters from your reference images.
Social Media Content — Produce personalized short-form videos with consistent character identity.
Marketing & Ads — Generate brand ambassador or spokesperson videos from still photos.
Creative Concepting — Combine multiple characters or elements into new scenarios for rapid ideation.
Style Transfer — Use a reference video to guide the motion and visual style of new content.

Pro Tips

Reference images with clear faces and distinct features produce the best character consistency.
Use "Figure 1", "Figure 2" etc. in your prompt to refer to specific reference images in order.
Use shorter durations (3–5s) for testing character consistency before generating longer clips.
Match aspect ratio to your target platform: 16:9 for YouTube, 9:16 for TikTok and Reels.
Use multi_prompt to build smooth narrative progressions across the clip.

Notes

Prompt is the only required field; reference images are strongly recommended for best results.
Duration range: minimum 3 seconds, maximum 15 seconds.
When a reference video is provided, the cost is 1.5× the base rate regardless of sound settings.
Ensure uploaded URLs are publicly accessible.

Related Models

Kling Video O3 Pro Reference-to-Video — Maximum quality reference-to-video with O3 Pro tier.
Kling Video O3 Std Image-to-Video — Animate a single image into video at Standard pricing.
Kling Video O3 Std Text-to-Video — Generate videos from text prompts at Standard pricing.
Kling Video O3 Std Video Edit — Edit existing videos with natural-language instructions.

Kling Video O3 Std Reference To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/reference-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Kling Video O3 Std Reference To Video below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/reference-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "video": "https://example.com/your-input.mp4",
    "keep_original_sound": true,
    "sound": false,
    "aspect_ratio": "16:9",
    "duration": 5,
    "shot_type": "customize"
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("kwaivgi/kling-video-o3-std/reference-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "video": "https://example.com/your-input.mp4",
        "keep_original_sound": true,
        "sound": false,
        "aspect_ratio": "16:9",
        "duration": 5,
        "shot_type": "customize"
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "kwaivgi/kling-video-o3-std/reference-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "video": "https://example.com/your-input.mp4",
    "keep_original_sound": true,
    "sound": false,
    "aspect_ratio": "16:9",
    "duration": 5,
    "shot_type": "customize"
}
)

print(output["outputs"][0])  # → URL of the generated output

Kling Video O3 Std Reference To Video API — Frequently asked questions

What is the Kling Video O3 Std Reference To Video API?

Kling Video O3 Std Reference To Video is a Kuaishou model for video generation from images, exposed as a REST API on WaveSpeedAI. Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Kling Video O3 Std Reference To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-video-o3-std-reference-to-video.

How much does Kling Video O3 Std Reference To Video cost per run?

Kling Video O3 Std Reference To Video starts at $0.42 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Kling Video O3 Std Reference To Video accept?

Key inputs: `prompt`, `images`, `video`, `aspect_ratio`, `duration`, `keep_original_sound`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-video-o3-std-reference-to-video.

How do I get started with the Kling Video O3 Std Reference To Video API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Kling Video O3 Std Reference To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Kuaishou). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README

Kling Video O3 Std Reference-to-Video

Why Choose This?

Parameters

How to Use

Pricing

Billing Rules

Best Use Cases

Pro Tips

Notes

Related Models

Kling Video O3 Std Reference To Video API — Quick start

Kling Video O3 Std Reference To Video API — Frequently asked questions