Kling 2.6 Pro Image-to-Video
Kling 2.6 Pro Image-to-Video adds audio-video co-generation to Kling's powerful visual pipeline. Start from a still image, write a prompt, and the model produces a short clip where motion, camera, sound effects, and voice all feel like one coherent scene.
Why Choose This?
-
Audio and video in one pass
Jointly generates visuals and soundtrack — no post-production audio sync needed.
-
Character-synced voices
Speech and reactions that match the on-screen subject and timing.
-
Scene-aware sound design
Ambient noise and SFX that follow what happens in the frame.
-
Start and end frame support
Use both a starting image and optional ending image to guide the animation.
-
Voice customization
Add custom voices via voice_list for character-specific audio.
-
Prompt Enhancer
Built-in tool to automatically improve your prompts for better results.
Parameters
| Parameter | Required | Description |
|---|
| prompt | Yes | Describe scene motion, camera moves, and audio |
| image | Yes | Starting frame to animate (upload or URL) |
| negative_prompt | No | Elements to avoid in visuals and audio |
| end_image | No | Ending frame to guide the animation target |
| cfg_scale | No | Guidance strength (default: 0.5) |
| sound | No | Enable audio-video co-generation (default: true) |
| voice_list | No | Custom voices for character audio |
| duration | No | Video length: 5 or 10 seconds |
CFG Scale Guide
- Lower values (0.3-0.5): Looser, more natural motion; image has more influence
- Higher values (0.6-0.8): Closer adherence to prompt; can look more "controlled"
How to Use
- Upload your image — the starting frame to animate.
- Write your prompt — describe camera movement, actions, and audio.
- Add negative prompt (optional) — specify what to avoid.
- Upload end image (optional) — guide where the animation should end.
- Adjust cfg_scale — start with default 0.5, increase if needed.
- Enable sound — check for audio generation, uncheck for silent video.
- Add voices (optional) — click "+ Add Item" for custom character voices.
- Select duration — choose 5 or 10 seconds.
- Run — submit and download your video.
Pricing
| Duration | Sound Off | Sound On |
|---|
| 5s | $0.35 | $0.70 |
| 10s | $0.70 | $1.40 |
Billing Rules
- Base rate: $0.35 per 5 seconds (without audio)
- Audio multiplier: 2× when sound is enabled
- Total cost = $0.35 × (duration / 5) × (sound ? 2 : 1)
Best Use Cases
- Promo Videos — Launch videos with native-sounding, character-synced voiceover.
- Storytelling — Shorts where camera, action, and sound feel perfectly integrated.
- Product Explainers — Clear visuals with natural narration built in.
- Social Content — Cinematic posts with immersive ambience and SFX.
- Animated Scenes — Bring still images to life with coherent motion and audio.
Pro Tips
- Keep the image and prompt aligned — don't describe a totally different scene.
- For strong lip-sync, explicitly mention who is speaking and what voice style you want.
- Start with default cfg_scale (0.5); increase slowly if motion doesn't match your description.
- Use negative_prompt to reduce logos, watermarks, or unwanted artifacts.
- Use end_image to guide the animation toward a specific final composition.
- Include audio cues in your prompt (e.g., "soft city ambience, subtle whooshes on cuts").
Notes
- Supported durations are 5 and 10 seconds.
- Audio generation doubles the cost but creates synchronized sound design.
- For best results, use sharp, well-lit source images.
- End image helps create more controlled transitions.
- End image and sound cannot be used together. When using end_image, the sound parameter must be disabled.
Related Models
- Kling 2.6 Pro Text-to-Video — Generate videos from text prompts only.
- Vidu Q2 Pro Image-to-Video — Alternative I2V with BGM support.