Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Home/Explore/Avatar Lipsync Models/wavespeed-ai/infinitetalk/video-to-video-multi

InfiniteTalk Video-To-Video Multi

wavespeed-ai/infinitetalk/video-to-video-multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human
Input

Drag & drop or click to upload

Drag & drop or click to upload

Drag & drop or click to upload

Idle

Your request will cost $0.15 per run.

For $10 you can run this model approximately 66 times.

ExamplesView all

README

InfiniteTalk Video-to-Video Multi

InfiniteTalk Video-to-Video Multi creates lip-synced videos for multiple characters by combining an input video with two audio tracks. It maintains identity across unlimited-length videos, ensuring precise lip synchronization while matching head, face, and body movements to each audio source — perfect for dialogues, interviews, and multi-speaker content.

Why Choose This?

  • Multi-character lip sync Synchronizes lip motion precisely with audio for two characters simultaneously.

  • Flexible speaking order Choose left-to-right, right-to-left, or simultaneous (meanwhile) speaking patterns.

  • Full-body coherence Captures head movements, facial expressions, and posture changes beyond the lips.

  • Identity preservation Maintains consistent facial identity and visual style across all frames.

  • Mask control Optional mask images define which regions can move for precise control.

  • Long-form support Process videos up to 10 minutes in length with consistent quality.

Parameters

ParameterRequiredDescription
videoYesSource video with two visible characters
left_audioYesAudio track for the left character
right_audioYesAudio track for the right character
mask_imageNoMask defining animatable regions
promptNoText prompt to guide scene or behavior
orderNoSpeaking order: meanwhile, left_right, or right_left
resolutionNoOutput resolution: 480p (default) or 720p
seedNoRandom seed for reproducibility (-1 for random)

How to Use

  1. Upload your video — provide a video clearly showing two people.
  2. Upload left audio — add audio file for the left character.
  3. Upload right audio — add audio file for the right character.
  4. Add mask image (optional) — define which regions should animate.
  5. Write prompt (optional) — guide scene, pose, or behavior.
  6. Select speaking order — choose meanwhile (simultaneous), left_right, or right_left.
  7. Choose resolution — 480p for faster processing, 720p for higher quality.
  8. Run — submit and download your lip-synced video.

Pricing

ResolutionCost per 5sPer-Second Rate
480p$0.15$0.03/s
720p$0.30$0.06/s

Billing Rules

  • Minimum charge: 5 seconds ($0.15 at 480p, $0.30 at 720p)
  • Maximum duration: 600 seconds (10 minutes)
  • Duration calculation:
  • Sequential (left_right / right_left): left_audio + right_audio
  • Simultaneous (meanwhile): max(left_audio, right_audio)

Best Use Cases

  • Dialogue Scenes — Create realistic conversations between two characters.
  • Interview Content — Generate interviewer-interviewee videos with synced audio.
  • Podcast Visuals — Add visual elements to two-person podcast recordings.
  • Educational Content — Create instructor dialogues and Q&A sessions.
  • Digital Presenters — Build multi-character presentation videos.

Pro Tips

  • Ensure both characters are clearly visible in the source video.
  • Use "meanwhile" for overlapping dialogue or simultaneous speech.
  • Use "left_right" or "right_left" for sequential conversation flow.
  • Mask only the regions you want to animate — uploading the full image as mask will result in a black output.
  • Higher quality source videos produce better lip-sync results.

Notes

  • Maximum video length: 10 minutes (600 seconds).
  • Processing time: approximately 10–30 seconds per 1 second of video.
  • Mask safety: Do not upload the full image as mask — only cover animatable regions.
  • Ensure uploaded file URLs are publicly accessible.

Related Models

Reference