Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only

Avatar Lipsync Models

WaveSpeedAI's AI Avatars delivers lifelike virtual characters with advanced lip sync and realistic expressions.

Our selection

wavespeed-ai/infinitetalk
digital-human

wavespeed-ai/infinitetalk

InfiniteTalk converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

All Models

40 models
digital-human

wavespeed-ai/infinitetalk

InfiniteTalk converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk/video-to-video

Audio-driven InfiniteTalk turns one video plus audio into realistic talking or singing videos with lip-sync in 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast

InfiniteTalk fast converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk/multi

InfiniteTalk Multi converts a single image and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-lipsync/audio-to-video

Kling LipSync converts audio into talking head video by generating lifelike lip movements perfectly synced to the input audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-lipsync/text-to-video

Kling TextToVideo by Kwaivgi creates videos with lifelike lip movements that precisely sync to input text for natural speaking visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v2-ai-avatar-standard

Kling AI Avatar generates high-quality AI avatar videos for profiles, intros, and social content, delivering clean detail and cinematic motion with reliable prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast/multi

InfiniteTalk fast multi converts a single image and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v2-ai-avatar-pro

Kling V2 AI Avatar Pro generates high-quality AI avatar videos with clean detail, stable motion, and strong identity consistency—ideal for profiles, intros, and social content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk/video-to-video-multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast/video-to-video

Audio-driven infinitetalk-fast turns one video plus audio into realistic talking or singing videos with lip-sync. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v1-ai-avatar-standard

Kling AI Avatar produces stunning AI-generated video avatars for digital identity and content creation, with on-demand video billed at $0.25 per 5 seconds. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v1-ai-avatar-pro

Kling AI Avatar Pro converts audio into talking video portraits; pricing is $1 for the first 5s then $0.20/s up to 600s. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast/video-to-video-multi

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

heygen/video-translate

HeyGen Video Translate: AI video translation into 70+ languages and 175+ dialects with no voice actors or dubbing. Fast, accurate, easy to use at $0.0375/sec. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/latentsync

LatentSync synchronizes video and audio inputs to generate seamless synchronized content. Perfect for lip-syncing, audio dubbing, and video-audio alignment tasks.

digital-human

veed/lipsync

Generate realistic lip-sync animations from audio with high-quality synchronization using Veed LipSync; $0.15 per 5s of video. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/latentsync

LatentSync combines Stable Diffusion and TREPA for high-res end-to-end lip-sync, delivering precise, realistic mouth motions in generated videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/hunyuan-avatar

Hunyuan Avatar creates audio-driven talking or singing videos from one image + audio, in 480p/720p up to 120s (starts at $0.15/5s). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/avatar-omni-human-1.5

OmniHuman 1.5 converts audio and visual cues into lifelike avatar animations for virtual humans, storytelling, and interactive agents. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

wavespeed-ai/multitalk

MultiTalk converts one image and audio into audio-driven talking/singing videos (Image-to-Video), supporting up to 10 minutes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

motion-control

wavespeed-ai/wan-2.2/animate

Wan2.2-Animate unified character animation & replacement model replicating movement and expression; generates 720p videos up to 120s. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

veed/fabric-1.0

VEED Fabric 1.0 turns one image into dynamic, talking videos and AI avatars in 480p or 720p (starts at $0.35/5s 480p, $0.7/5s 720p). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/wan-2.1/mocha

MoCha performs Video-To-Video character swaps using reference images, replacing a video's character without per-frame pose or depth maps. Ready-to-use REST inference API, no coldstarts, affordable pricing.

digital-human

sync/lipsync-1.9.0-beta

Generate realistic lip-sync animations from audio using advanced algorithms for high-quality facial synchronization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/lipsync-2

Sync Lipsync-2 synchronizes lip movements in any video to supplied audio, enabling realistic mouth alignment for films, podcasts, games, or animations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/lipsync-2-pro

Lipsync-2-pro creates studio-grade lip synchronization for video-to-video editing in minutes, not weeks. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

pixverse/lipsync

PixVerse LipSync converts audio into realistic lip-sync animations with advanced algorithms for precise mouth movements and timing for video avatars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/wan-2.2/speech-to-video

Wan-2.2-S2V turns images and speech into high-fidelity videos with realistic face and body motion; supports up to 10-minute clips in 480p, from $0.15/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/wan-2.1/multitalk

MultiTalk (WAN 2.1) is an audio-driven AI that turns a single image and audio into talking or singing conversational videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

motion-control

wavespeed-ai/steady-dancer

SteadyDancer is a 14B-parameter human image animation framework that transforms static images into coherent dance videos. Features first-frame preservation, robust identity consistency, and temporal coherence for realistic motion generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/react-1

Sync React-1 is a production-grade video-to-video lip-sync model. It maps any speech track to a target face, producing phoneme-accurate visemes and smooth timing while preserving identity, head pose, lighting, and background. Supports emotion and intensity control, multilingual speech, and long takes for talking-head content. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

digital-human

wavespeed-ai/longcat-avatar

LongCat Avatar produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 2 minutes, 720p tier $0.40/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/ltx-2-19b/lipsync

LTX-2 Lipsync generates synchronized talking head videos from a reference image and audio input. Powered by the 19B DiT architecture, it produces high-fidelity lip-synced videos with natural head movements. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/soulx-flashhead

SoulX FlashHead enables real-time streaming talking head video generation from portrait image and audio with ultra-fast 96 FPS performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/skyreels-v3/talking-avatar

SkyReels V3 Talking Avatar is a 19B-parameter multimodal model that generates talking avatars from portrait and audio with precise lip sync, supporting up to 20 seconds at 720p resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/ltx-2.3/lipsync

LTX-2.3 Lipsync generates talking head videos from audio with synchronized lip movements and natural facial expressions. Built on DiT-based architecture with improved audio-visual alignment quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/lipsync/audio-to-video

LipSync turns audio into lifelike talking videos by generating precise lip movements fully synced to input audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/avatar-omni-human

OmniHuman turns a single portrait photo into avatar video with lifelike motion and expressions ($0.12/sec). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/lipsync-3

Sync Lipsync 3 synchronizes lip movements in any video to supplied audio using zero-shot lip-sync technology. Supports multiple sync modes for handling duration mismatches, works with live-action, 3D characters, and AI-generated avatars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.