Seedance 2.0 is live! Start creating in Video Generator→

Wan 2.6 Models

Wan 2.6 Models unify text-, image-, and reference-driven video generation with native, synchronized audio in one pass—delivering sharper detail, smoother cinematic motion, and more consistent camera language for production-ready storytelling at scale.

Wan 2.6 Models unify text-, image-, and reference-driven video generation with native, synchronized audio in one pass—delivering sharper detail, smoother cinematic motion, and more consistent camera language for production-ready storytelling at scale.

All Models

10 models
image-to-video

alibaba/wan-2.6/image-to-video

WAN 2.6 converts text or images into videos (720p/1080p) with synced audio, faster and more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-video

alibaba/wan-2.6/image-to-video-spicy

WAN 2.6 Spicy converts images into unlimited high-quality videos with smooth animations optimized for scalable content generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-video

alibaba/wan-2.6/reference-to-video-flash

WAN 2.6 Reference-to-Video Flash turns character, prop, or scene references from images or videos into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Flash version with faster generation speed. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

alibaba/wan-2.6/image-to-video-flash

WAN 2.6 Flash converts images into videos (720p/1080p) with optional audio, optimized for speed and cost. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

alibaba/wan-2.6/text-to-video

WAN 2.6 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-image

alibaba/wan-2.6/image-edit

WAN 2.6 Image-Edit turns prompts into precise photo edits—adjusting color and lighting, restyling aesthetics, replacing backgrounds, removing objects, and refining details while preserving subject identity. Built for stable, repeatable image-to-image pipelines. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

image-to-video

alibaba/wan-2.6/reference-to-video

WAN 2.6 Reference-to-Video turns character, prop, or scene references—single or multi-view—into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

video-extend

alibaba/wan-2.6/video-extend

WAN 2.6 Video-Extend turns short clips into longer videos with preserved or generated synchronized audio for continuity. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

alibaba/wan-2.6/image-to-video-pro

WAN 2.6 Image-to-Video Pro converts images into premium-quality videos with superior motion dynamics, enhanced visual fidelity, and professional cinematic output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

alibaba/wan-2.6/text-to-image

WAN 2.6 Text-to-Image generates high-quality images from natural-language prompts with strong prompt adherence and clean composition. It supports multiple aspect ratios and size control, seed-based reproducibility, and flexible styles (photorealistic to illustrative) for ads, product shots, and social visuals. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

Wan 2.6 Models

Alibaba’s Wan 2.6 Models are a compact, production-ready set of video generation endpoints covering the three core workflows: text-to-video, image-to-video, and reference-to-video. Built for fast iteration and reliable results, Wan 2.6 emphasizes stronger visual consistency, smoother motion, and more controllable cinematic style across generations.

Wan 2.6 Series — Text, Image & Reference to Video

Wan 2.6 offers three focused endpoints so you can generate from scratch, animate a still, or guide generation with a reference video—ideal for commercial pipelines, creative testing, and repeatable content production.

  1. Wan 2.6 Text-to-Video — Generate coherent, cinematic motion from text prompts for story beats, ads, and creative shorts.
  2. alibaba/wan-2.6/text-to-video
  3. Wan 2.6 Image-to-Video — Animate a still image into natural motion while preserving subject and composition for product shots and visual assets.
  4. alibaba/wan-2.6/image-to-video
  5. alibaba/wan-2.6/image-to-video-flash
  6. Wan 2.6 Reference-to-Video — Use a reference video to guide motion style, pacing, and framing to create new, high-consistency scenes.
  7. alibaba/wan-2.6/reference-to-video
  8. alibaba/wan-2.6/reference-to-video-flash
  9. Wan 2.6 Text-to-Image — Generate high-quality images from text prompts for key visuals, concept frames, and rapid creative exploration.
  10. alibaba/wan-2.6/text-to-image
  11. Wan 2.6 Image-Edit — Edit and refine images with precise, controlled changes while preserving structure and subject consistency.
  12. alibaba/wan-2.6/image-edit

Highlights

  1. 5s Video Reference → Character + Voice Consistency: Recreate a character and voice from a short reference clip for more stable identity across shots.
  2. Cinematic Multi-Shot Storytelling: Turn prompts or storyboards into coherent, multi-scene sequences with stronger continuity.
  3. 1080P Output: Generate high-fidelity videos suitable for social and production workflows.
  4. Sound-Picture Sync: Native audio alignment with visuals—dialogue, music, and sound effects that land on beat.
  5. Fast Iteration, More Control: Built for rapid creative testing while keeping style, framing, and motion more consistent.