WaveSpeedAI·video·From $0.075/run

InfiniteTalk API

WaveSpeedAI InfiniteTalk — drive any character image with an audio file to produce a lip-synced, multi-shot talking video. No avatar library, no per-character training.

Open Playground →View API Docs

About the InfiniteTalk API

What InfiniteTalk does, how it fits in the WaveSpeedAI model lineup, and why teams reach for it.

InfiniteTalk is a video generation model from WaveSpeedAI, available through the WaveSpeedAI REST API. WaveSpeedAI InfiniteTalk — drive any character image with an audio file to produce a lip-synced, multi-shot talking video. No avatar library, no per-character training.

The InfiniteTalk family on WaveSpeedAI ships 8 REST endpoints covering Digital-Human workflow. Each variant carries its own pricing, parameter knobs, and example outputs — pick the one that matches your input modality and production constraints, or call several from the same API key to compose multi-step pipelines.

Run InfiniteTalk through the same API key, billing account, and rate-limit envelope you use for the other 1,000+ AI models on WaveSpeedAI. No separate vendor setup, no per-provider SDKs, no per-vendor rate-limit envelopes — one integration covers everything from text-to-image and text-to-video through audio synthesis, 3D generation, upscaling, and editing.

All InfiniteTalk API endpoints

8 endpoints available now on WaveSpeedAI — pick the variant that matches your workflow.

Video To Video Multi (Fast)

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Video To Video (Fast)

Audio-driven infinitetalk-fast turns one video plus audio into realistic talking or singing videos with lip-sync. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Video To Video Multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.15

Multi (Fast)

InfiniteTalk fast multi converts a single image and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Multi

InfiniteTalk Multi converts a single image and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.15

Infinitetalk Fast (Fast)

InfiniteTalk fast converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Video To Video

Audio-driven InfiniteTalk turns one video plus audio into realistic talking or singing videos with lip-sync in 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.15

Infinitetalk

InfiniteTalk converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-humanfrom $0.15

See InfiniteTalk in action

Real outputs generated by the InfiniteTalk API. Hover any video to preview, click to open the full-size viewer.

How to use the InfiniteTalk API

Four steps from signup to a finished generation. Full Python, Node.js, and cURL examples are in the API section below.

1
Get an API key
Sign up for a WaveSpeedAI account and copy your API key from the dashboard. New accounts come with free starter credits — enough to run the playground a few dozen times before billing kicks in.
2
Submit a prediction
POST your input as JSON to https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk. The endpoint returns a prediction id immediately — generations are async so you don't hold an open connection during inference.
3
Poll for completion
GET https://api.wavespeed.ai/api/v3/predictions/{request_id}/result every 1-2 seconds. The response includes a status field; keep polling until it flips from "queued" or "processing" to "completed".
4
Read the output URL
Once status is "completed", read the URL from data.outputs[0]. The URL points to your generated media on the WaveSpeedAI CDN — image, video, audio, or 3D file depending on the InfiniteTalk variant you called.

What you can build with InfiniteTalk

Common workflows developers and creators use the InfiniteTalk API for.

Talking-head video from audio

Upload a character image + an audio track — get a lip-synced video of that character delivering the audio. Works for real photos, illustrations, and AI-generated characters.

talking-headavatarlip-sync

Multilingual narrator videos

Localize the same character across multiple languages — same brand mascot, different voiceovers, consistent visual identity.

multilinguallocalizationnarrator

Podcast-to-video conversions

Turn podcast episodes into video with a host character — significantly cheaper than recorded video production and easier to update than re-recording.

podcastvideodistribution

Localized voiceover videos

For global content teams — keep the visual asset, swap the voiceover, and let InfiniteTalk re-sync lips to the new language.

voiceoverglobalvideo

Brand mascot videos

Animate any branded character (illustrated mascot, custom avatar, AI-generated character) speaking marketing copy — no per-character avatar training required.

mascotbrandmarketing

Tips for prompting InfiniteTalk

Practical advice for getting better outputs from InfiniteTalk — drawn from the patterns that work across video models in production pipelines.

Be specific about camera moves

Mention concrete cinematography vocabulary — orbit, dolly-in, push-in, pan-left, crane shot, handheld follow. Generic prompts produce static or arbitrary camera choices; named camera moves map directly to motion intent in the model's training data and dramatically improve shot quality.

Anchor character identity with reference images

If your prompt depends on a specific person, character, or product, upload a reference image alongside the prompt. Without a reference, identity drifts across frames and across shots — the same character ends up looking like a slightly different person each generation.

Describe lighting and time of day

Lighting cues like 'golden hour, soft warm directional light' or 'overcast diffused light, slate-grey sky' improve quality and consistency far more than vague quality modifiers. Lighting is one of the strongest priors the model conditions on.

Use negative prompts to suppress common failure modes

Useful negatives for video: 'frame flicker, motion blur, watermark, text artifacts, distorted hands, low resolution, jpeg compression'. Negative prompts cost nothing and noticeably reduce the rate of generations you'd otherwise re-roll.

Pick the shortest duration that captures your beat

Most prompts work best at 5-8 seconds. Longer clips amplify temporal inconsistencies (subject morphing, environment drift). If you need a 20-second sequence, generate three 6-8 second clips and edit them together — quality stays higher than one long generation.

Match aspect ratio to platform up front

9:16 for TikTok / Reels / Shorts, 16:9 for landscape feeds and YouTube, 1:1 for post grids. Models train slightly differently per aspect ratio — cropping a 16:9 to 9:16 after the fact loses both fidelity and the composition the model intended.

InfiniteTalk API pricing

Pricing is per-output. The final charge scales with the parameters you set in each variant's playground (resolution, duration, output count, references).

Endpoint	Type	Starting price
wavespeed-ai/infinitetalk-fast/video-to-video-multi	digital-human	$0.075
wavespeed-ai/infinitetalk-fast/video-to-video	digital-human	$0.075
wavespeed-ai/infinitetalk/video-to-video-multi	digital-human	$0.15
wavespeed-ai/infinitetalk-fast/multi	digital-human	$0.075
wavespeed-ai/infinitetalk/multi	digital-human	$0.15
wavespeed-ai/infinitetalk-fast	digital-human	$0.075
wavespeed-ai/infinitetalk/video-to-video	digital-human	$0.15
wavespeed-ai/infinitetalk	digital-human	$0.15

Call the InfiniteTalk API

Sign up for an API key at wavespeed.ai/accesskey, then submit a prediction via REST. The playground generates ready-to-paste samples for any combination of inputs.

HTTP example

# 1. Submit a prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{}'

# 2. Poll the result until status = "completed"
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# Read the output URL from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY

const result = await client.run("wavespeed-ai/infinitetalk", {});
console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/infinitetalk",
    {}
)
print(output["outputs"][0])  # → URL of the generated output

InfiniteTalk vs alternatives

When to pick InfiniteTalk over similar models on WaveSpeedAI.

InfiniteTalk vs Stock avatar libraries (HeyGen-style)

Stock-avatar tools limit you to their library of pre-trained avatars. InfiniteTalk accepts any character image — your own brand mascot, an AI-generated character, an illustrated host — without per-character setup or training.

InfiniteTalk vs Wan 2.2 Speech-to-Video

Both are audio-driven. Speech-to-Video is part of the broader Wan 2.2 toolkit. InfiniteTalk is specialized for lip-sync quality, multi-shot composition, and identity preservation across longer audio.

InfiniteTalk vs ElevenLabs voice tools

ElevenLabs is voice-only — generates or clones audio. InfiniteTalk is the video layer on top: pair an ElevenLabs voiceover with a character image to get a full lip-synced video.

InfiniteTalk API — Frequently asked questions

Pricing, license, integration — common questions about running InfiniteTalk on WaveSpeedAI.

What is the InfiniteTalk API?

InfiniteTalk is a WaveSpeedAI video generation model exposed as a REST API on WaveSpeedAI. WaveSpeedAI InfiniteTalk — drive any character image with an audio file to produce a lip-synced, multi-shot talking video. No avatar library, no per-character training. You can call it programmatically or try it from the playground linked above.

How do I call the InfiniteTalk API?

Sign up for a WaveSpeedAI account, copy your API key from /accesskey, then POST to https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to "completed", then read the output URL from data.outputs[0]. Full Python / Node.js / cURL examples are above.

How much does the InfiniteTalk API cost?

InfiniteTalk starts at $0.075 per run. The exact cost scales with the parameters you set (resolution, duration, output count, references). The live cost preview next to the Generate button in the playground shows the exact price for your current input.

Which InfiniteTalk variants are available?

WaveSpeedAI hosts 8 InfiniteTalk endpoints: wavespeed-ai/infinitetalk-fast/video-to-video-multi, wavespeed-ai/infinitetalk-fast/video-to-video, wavespeed-ai/infinitetalk/video-to-video-multi, wavespeed-ai/infinitetalk-fast/multi, wavespeed-ai/infinitetalk/multi, wavespeed-ai/infinitetalk-fast, wavespeed-ai/infinitetalk/video-to-video, wavespeed-ai/infinitetalk. Each variant has its own playground page and pricing.

Can I use InfiniteTalk outputs commercially?

Commercial usage rights follow the WaveSpeedAI model license. Most WaveSpeedAI models permit commercial output use; see each model's playground page for the specific license summary, and WaveSpeedAI's Terms of Service for platform-level conditions.

Why use InfiniteTalk on WaveSpeedAI instead of going direct?

One API key + one billing account across InfiniteTalk AND 1,000+ other AI models from other providers. No per-vendor SDK setup, no separate rate-limit envelopes, no rewrite-per-vendor integration code. Pricing is typically at parity with or below WaveSpeedAI's direct API.

About WaveSpeedAI

The team behind InfiniteTalk and the broader WaveSpeedAI model lineup on WaveSpeedAI.

WaveSpeedAI runs an inference platform that hosts 1,000+ AI models from every major provider — ByteDance, Google, OpenAI, Alibaba, Kuaishou, ElevenLabs, and dozens of independent labs — behind one API key, one billing account, and one rate-limit envelope. WaveSpeedAI also ships first-party models (Image / Video Upscalers, Watermark Removers, Animate, InfiniteTalk) tuned for production pipelines.

Related model APIs on WaveSpeedAI

Other AI APIs from WaveSpeedAI and the rest of the video model lineup — one API key, one billing account.

Image Upscaler API

WaveSpeedAI

WaveSpeedAI Image Upscaler — upscale any image to 2K, 4K, or 8K with AI super-resolution. Detail preservation, JPEG artifact removal, and dedicated face-enhancement mode.

Video Upscaler Pro API

WaveSpeedAI

WaveSpeedAI Video Upscaler Pro — AI-driven video upscaling to 4K with seamless frame-to-frame consistency. No flicker, no temporal artifacts, production-ready output.

Video Watermark Remover API

WaveSpeedAI

WaveSpeedAI Video Watermark Remover — strip watermarks, logos, captions, or any unwanted overlay from videos up to 10 minutes long, with background reconstruction that holds across motion.

Seedance 2.0 API

ByteDance

ByteDance's flagship video model — director-level camera control, native audio, real-world physics in a single pass.

Seedance 1.5 Pro API

ByteDance

ByteDance Seedance 1.5 Pro — the prior-generation video model, still production-grade, with built-in video-extend support and a Fast tier for high-throughput pipelines at a fraction of Seedance 2.0's cost.

Veo 3.1 API

Google

Google's Veo 3.1 — flagship video model with industry-leading photorealism, natural human/character rendering, and three tiers (Standard, Fast, Lite) to span the cost/quality range. Includes reference-to-video, video-extend, and start-end frame interpolation.

Start building with InfiniteTalk on WaveSpeedAI

Free starter credits on signup. One API key across 1,000+ AI models from WaveSpeedAI and every other provider.

Open InfiniteTalk Playground →Get an API Key

InfiniteTalk API

About the InfiniteTalk API

All InfiniteTalk API endpoints

Video To Video Multi (Fast)

Video To Video (Fast)

Video To Video Multi

Multi (Fast)

Multi

Infinitetalk Fast (Fast)

Video To Video

Infinitetalk

See InfiniteTalk in action

How to use the InfiniteTalk API

Get an API key

Submit a prediction

Poll for completion

Read the output URL

What you can build with InfiniteTalk

Talking-head video from audio

Multilingual narrator videos

Podcast-to-video conversions

Localized voiceover videos

Brand mascot videos

Tips for prompting InfiniteTalk

Be specific about camera moves

Anchor character identity with reference images

Describe lighting and time of day

Use negative prompts to suppress common failure modes

Pick the shortest duration that captures your beat

Match aspect ratio to platform up front

InfiniteTalk API pricing

Call the InfiniteTalk API

InfiniteTalk vs alternatives

InfiniteTalk vs Stock avatar libraries (HeyGen-style)

InfiniteTalk vs Wan 2.2 Speech-to-Video

InfiniteTalk vs ElevenLabs voice tools

InfiniteTalk API — Frequently asked questions

About WaveSpeedAI

Related model APIs on WaveSpeedAI

Image Upscaler API

Video Upscaler Pro API

Video Watermark Remover API

Seedance 2.0 API

Seedance 1.5 Pro API

Veo 3.1 API

Start building with InfiniteTalk on WaveSpeedAI