WaveSpeed.ai
Home/Explore/Kling Models/kwaivgi/kling-v2.6-pro/image-to-video
image-to-video

image-to-video

Kling 2.6 Pro

kwaivgi/kling-v2.6-pro/image-to-video

Kling 2.6 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

preview

Hint: You can drag and drop a file or click to upload

Whether sound is generated simultaneously when generating a video

Idle

Your request will cost $0.35 per run.

For $10 you can run this model approximately 28 times.

One more thing:

ExamplesView all

README

Kling 2.6 Pro Image-to-Video

Kling 2.6 Pro Image-to-Video adds audio-video co-generation to Kling's powerful visual pipeline. Start from a still image, write a prompt, and the model produces a short clip where motion, camera, sound effects, and voice all feel like one coherent scene.

Why Choose This?

  • Audio and video in one pass Jointly generates visuals and soundtrack — no post-production audio sync needed.

  • Character-synced voices Speech and reactions that match the on-screen subject and timing.

  • Scene-aware sound design Ambient noise and SFX that follow what happens in the frame.

  • Start and end frame support Use both a starting image and optional ending image to guide the animation.

  • Voice customization Add custom voices via voice_list for character-specific audio.

  • Prompt Enhancer Built-in tool to automatically improve your prompts for better results.

Parameters

ParameterRequiredDescription
promptYesDescribe scene motion, camera moves, and audio
imageYesStarting frame to animate (upload or URL)
negative_promptNoElements to avoid in visuals and audio
end_imageNoEnding frame to guide the animation target
cfg_scaleNoGuidance strength (default: 0.5)
soundNoEnable audio-video co-generation (default: true)
voice_listNoCustom voices for character audio
durationNoVideo length: 5 or 10 seconds

CFG Scale Guide

  • Lower values (0.3-0.5): Looser, more natural motion; image has more influence
  • Higher values (0.6-0.8): Closer adherence to prompt; can look more "controlled"

How to Use

  1. Upload your image — the starting frame to animate.
  2. Write your prompt — describe camera movement, actions, and audio.
  3. Add negative prompt (optional) — specify what to avoid.
  4. Upload end image (optional) — guide where the animation should end.
  5. Adjust cfg_scale — start with default 0.5, increase if needed.
  6. Enable sound — check for audio generation, uncheck for silent video.
  7. Add voices (optional) — click "+ Add Item" for custom character voices.
  8. Select duration — choose 5 or 10 seconds.
  9. Run — submit and download your video.

Pricing

DurationSound OffSound On
5s$0.35$0.70
10s$0.70$1.40

Billing Rules

  • Base rate: $0.35 per 5 seconds (without audio)
  • Audio multiplier: 2× when sound is enabled
  • Total cost = $0.35 × (duration / 5) × (sound ? 2 : 1)

Best Use Cases

  • Promo Videos — Launch videos with native-sounding, character-synced voiceover.
  • Storytelling — Shorts where camera, action, and sound feel perfectly integrated.
  • Product Explainers — Clear visuals with natural narration built in.
  • Social Content — Cinematic posts with immersive ambience and SFX.
  • Animated Scenes — Bring still images to life with coherent motion and audio.

Pro Tips

  • Keep the image and prompt aligned — don't describe a totally different scene.
  • For strong lip-sync, explicitly mention who is speaking and what voice style you want.
  • Start with default cfg_scale (0.5); increase slowly if motion doesn't match your description.
  • Use negative_prompt to reduce logos, watermarks, or unwanted artifacts.
  • Use end_image to guide the animation toward a specific final composition.
  • Include audio cues in your prompt (e.g., "soft city ambience, subtle whooshes on cuts").

Notes

  • Supported durations are 5 and 10 seconds.
  • Audio generation doubles the cost but creates synchronized sound design.
  • For best results, use sharp, well-lit source images.
  • End image helps create more controlled transitions.
  • End image and sound cannot be used together. When using end_image, the sound parameter must be disabled.

Related Models