Enjoy 50% OFF Vidu Q3 & Q3 Pro models • Only on WaveSpeedAI | May 20 – Jun 2

Kling V2.6 Pro Image to Video

kwaivgi /

Kling 2.6 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-video
Input

Drag & drop or click to upload

preview

Drag & drop or click to upload

Whether sound is generated simultaneously when generating a video

Idle

$0.35per run·~28 / $10

Next:

ExamplesView all

Use the uploaded sci-fi alley image as the first frame. Keep the same alley, neon signs, reflections and the hooded woman walking away. Slowly move the camera forward down the alley behind her, like a tracking shot, with smooth, cinematic motion and slight handheld feeling. Let the rain keep falling, with droplets visible in the light beams and more ripples appearing in the puddles as the camera advances. Occasionally, one neon sign flickers and a distant train light passes across the sky between the buildings. Style: realistic cyberpunk night scene, rich colors, deep contrast, subtle lens bloom on the neon. Audio: ambient city noise with distant traffic and voices, soft electronic music pulse, loudest near the middle of the clip, no dialogue.

Scene: A dimly lit casino VIP room, with a green felt poker table at the center and a haze of drifting cigarette smoke surrounding the space. Subject: A suited man leans forward with his elbow on the table and says: "Three rounds to decide. Win, and all the chips are yours. Lose, and tell me the real reason you're getting close to him." Across from him, a curly-haired woman gently slides her fingertips along the edge of the table, her red lips curling slightly as she replies: "I don't care about the chips." Atmosphere is tense, cinematic, with dramatic low-key lighting and noir-style mood.

Scene: No visible people. Only a white robotic vacuum cleaner is shown along with its cleaning path on the floor. Audio: A soft female narrator speaks, accompanied by gentle vacuum-cleaning sound effects: "Still struggling with dust in the corners? This robotic vacuum cleans right up against the edges with no gaps, making your life easier and worry-free!" Camera: Follows the robot's cleaning path smoothly as it moves across the floor.

Scene: A tabletop setup featuring ASMR trigger props such as a crystal glass, wooden block, and makeup brushes. Audio: Soft "shhh—shhh" brushing sounds as a makeup brush gently sweeps across the crystal glass and wooden block. Camera: Focuses closely on the props and the precise hand movements, highlighting textures and subtle details. Atmosphere: Calm, soothing, and sensory-focused.

Scene: On a beach with sunlight spilling across golden sand, waves crashing onto the shore and forming white foam. Subject: A young American male wearing a backwards baseball cap, holding a camera for a selfie, smiling naturally. Audio: The young American male with a bright, sunny voice speaks to the camera: "The weather is amazing today! All my worries feel totally gone. I've been needing a day like this—sun, breeze, just the sound of the waves." Background includes layered ocean wave sounds, filmed in a close-up vlog-style shot.

On a rainy night street with neon lights flashing, the streetlights illuminate the wet ground as raindrops fall. A cellist stands under the streetlight, with raindrops dripping from their hair, playing the cello.The slow and affectionate solo melody of the cello , with a cold color tone.

Add a robot to the uploaded image. Then the robot walks up to the two birthday celebrants and says "Happy Birthday to You!" with its mouth movements perfectly synchronized to the words.

Related Models

README

Kling 2.6 Pro Image-to-Video

Kling 2.6 Pro Image-to-Video adds audio-video co-generation to Kling's powerful visual pipeline. Start from a still image, write a prompt, and the model produces a short clip where motion, camera, sound effects, and voice all feel like one coherent scene.

Why Choose This?

  • Audio and video in one pass Jointly generates visuals and soundtrack — no post-production audio sync needed.

  • Character-synced voices Speech and reactions that match the on-screen subject and timing.

  • Scene-aware sound design Ambient noise and SFX that follow what happens in the frame.

  • Start and end frame support Use both a starting image and optional ending image to guide the animation.

  • Voice customization Add custom voices via voice_list for character-specific audio.

  • Prompt Enhancer Built-in tool to automatically improve your prompts for better results.

Parameters

ParameterRequiredDescription
promptYesDescribe scene motion, camera moves, and audio
imageYesStarting frame to animate (upload or URL)
negative_promptNoElements to avoid in visuals and audio
end_imageNoEnding frame to guide the animation target
cfg_scaleNoGuidance strength (default: 0.5)
soundNoEnable audio-video co-generation (default: true)
voice_listNoCustom voices for character audio
durationNoVideo length: 5 or 10 seconds

CFG Scale Guide

  • Lower values (0.3-0.5): Looser, more natural motion; image has more influence
  • Higher values (0.6-0.8): Closer adherence to prompt; can look more "controlled"

How to Use

  1. Upload your image — the starting frame to animate.
  2. Write your prompt — describe camera movement, actions, and audio.
  3. Add negative prompt (optional) — specify what to avoid.
  4. Upload end image (optional) — guide where the animation should end.
  5. Adjust cfg_scale — start with default 0.5, increase if needed.
  6. Enable sound — check for audio generation, uncheck for silent video.
  7. Add voices (optional) — click "+ Add Item" for custom character voices.
  8. Select duration — choose 5 or 10 seconds.
  9. Run — submit and download your video.

Pricing

DurationSound OffSound On
5s$0.35$0.70
10s$0.70$1.40

Billing Rules

  • Base rate: $0.35 per 5 seconds (without audio)
  • Audio multiplier: 2× when sound is enabled
  • Total cost = $0.35 × (duration / 5) × (sound ? 2 : 1)

Best Use Cases

  • Promo Videos — Launch videos with native-sounding, character-synced voiceover.
  • Storytelling — Shorts where camera, action, and sound feel perfectly integrated.
  • Product Explainers — Clear visuals with natural narration built in.
  • Social Content — Cinematic posts with immersive ambience and SFX.
  • Animated Scenes — Bring still images to life with coherent motion and audio.

Pro Tips

  • Keep the image and prompt aligned — don't describe a totally different scene.
  • For strong lip-sync, explicitly mention who is speaking and what voice style you want.
  • Start with default cfg_scale (0.5); increase slowly if motion doesn't match your description.
  • Use negative_prompt to reduce logos, watermarks, or unwanted artifacts.
  • Use end_image to guide the animation toward a specific final composition.
  • Include audio cues in your prompt (e.g., "soft city ambience, subtle whooshes on cuts").

Notes

  • Supported durations are 5 and 10 seconds.
  • Audio generation doubles the cost but creates synchronized sound design.
  • For best results, use sharp, well-lit source images.
  • End image helps create more controlled transitions.
  • End image and sound cannot be used together. When using end_image, the sound parameter must be disabled.

Related Models

Accessibility:This website uses AI models provided by third parties.

Kling v2.6 Pro Image To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.6-pro/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Kling v2.6 Pro Image To Video below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.6-pro/image-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "image": "https://example.com/your-input.jpg",
    "negative_prompt": "blurry, low quality, distorted",
    "cfg_scale": 0.5,
    "sound": false,
    "duration": 5
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("kwaivgi/kling-v2.6-pro/image-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "image": "https://example.com/your-input.jpg",
        "negative_prompt": "blurry, low quality, distorted",
        "cfg_scale": 0.5,
        "sound": false,
        "duration": 5
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "kwaivgi/kling-v2.6-pro/image-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "image": "https://example.com/your-input.jpg",
    "negative_prompt": "blurry, low quality, distorted",
    "cfg_scale": 0.5,
    "sound": false,
    "duration": 5
}
)

print(output["outputs"][0])  # → URL of the generated output

Kling v2.6 Pro Image To Video API — Frequently asked questions

What is the Kling v2.6 Pro Image To Video API?

Kling v2.6 Pro Image To Video is a Kuaishou model for video generation from images, exposed as a REST API on WaveSpeedAI. Kling 2.6 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Kling v2.6 Pro Image To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.6-pro-image-to-video.

How much does Kling v2.6 Pro Image To Video cost per run?

Kling v2.6 Pro Image To Video starts at $0.35 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Kling v2.6 Pro Image To Video accept?

Key inputs: `prompt`, `image`, `duration`, `negative_prompt`, `cfg_scale`, `end_image`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.6-pro-image-to-video.

How long does Kling v2.6 Pro Image To Video take to generate?

Average end-to-end generation time on WaveSpeedAI is around 87 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Kling v2.6 Pro Image To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Kuaishou). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.