Longcat Avatar 1.5

Playground

LongCat Avatar 1.5 is the upgraded LongCat Avatar with sharper lip sync and faster generation. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), capped at 64 seconds per clip, 720p tier $0.40/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

Features

LongCat Avatar 1.5

LongCat Avatar 1.5 is an upgraded audio-driven avatar video model that turns a single photo into a realistic speaking or singing video. It delivers sharper lip sync, faster generation, natural head and body motion, and strong identity preservation, making it suitable for talking portraits, virtual presenters, short-form content, and avatar-driven storytelling.

Why Choose This?

Sharper lip synchronization Aligns mouth movement closely with the input audio for more natural speech and singing performance.
Natural full-body coherence Goes beyond lip motion to preserve believable head movement, facial expression, and body posture.
Strong identity preservation Maintains facial identity and overall visual consistency across frames.
Photo-to-avatar video Turns a static image into a lively speaking or singing performance.
Improved generation speed Faster than earlier LongCat avatar workflows while keeping strong realism.
Production-ready workflow Suitable for presenter videos, talking portraits, creator content, and personalized avatar clips.

Parameters

Parameter	Required	Description
audio	Yes	Audio file used to drive the speaking or singing performance.
image	Yes	Source portrait image of the person to animate.
prompt	No	Prompt to guide expression, style, pose, or motion behavior.
resolution	No	Output resolution: `480p` or `720p`.
seed	No	Random seed for reproducibility. Use a fixed value when you want more consistent results.

How to Use

Upload the audio — provide the voice or singing track to drive the avatar.
Upload the image — add the portrait photo of the person you want to animate.
Add a prompt — guide expression, pose, or visual style if needed.
Choose resolution — select 480p or 720p depending on quality and budget needs.
Set a seed (optional) — use a fixed value for more reproducible generations.
Submit — run the model and download the generated avatar video.

Example Prompt

Natural speaking performance with subtle head movement, calm expression, realistic lip sync, and stable identity.

Pricing

Pricing depends on output length and resolution.

Output Resolution	Cost per 5 Seconds	Max Billed Length
480p	$0.20	64 seconds
720p	$0.40	64 seconds

Billing Rules

480p costs $0.20 per 5 seconds
720p costs $0.40 per 5 seconds
Standard rate at 480p is $0.04 per second
720p is 2× the standard rate, or $0.08 per second
All videos are billed for a minimum of 5 seconds
Billing is capped at 64 seconds
Audio longer than 64 seconds is automatically trimmed

Best Use Cases

Talking portrait videos — Animate a still portrait into a speaking clip.
Virtual presenters — Create avatar-led explainers, intros, and business presentations.
Singing avatar content — Turn a portrait into a lip-synced singing performance.
Short-form creator videos — Produce avatar content for social media and promotional clips.
Personalized spokesperson media — Generate custom speaking videos without filming.

Pro Tips

Use a clear, front-facing portrait for better facial stability.
Upload clean audio for stronger lip sync and more natural rhythm.
Keep prompts simple and focused on expression or motion style.
Use 720p when realism matters more, and 480p when you want lower cost.
Reuse the same seed when comparing prompt variations on the same portrait.

Notes

Maximum clip length per job is 64 seconds
Audio longer than 64 seconds is automatically trimmed
Generation time is typically around 10–30 seconds of wall time per 1 second of video, depending on resolution and queue load
Better source portraits and cleaner audio generally improve overall quality

InfiniteTalk — Higher-end avatar video workflow for more advanced speaking performance.
InfiniteTalk Multi — Multi-character avatar workflow for more complex scenes.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/longcat-avatar-1.5" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "resolution": "480p",
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
image	string	Yes		-	The image for generating the output.
audio	string	Yes	-	-	The audio for generating the output.
prompt	string	No		-	The positive prompt for the generation.
resolution	string	No	480p	480p, 720p	The resolution of the output video.
seed	integer	No	-1	-1 ~ 2147483647	The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Longcat Avatar Longcat Avatar 1.5 Multi

Longcat Avatar 1.5

Playground

Features

LongCat Avatar 1.5

Why Choose This?

Parameters

How to Use

Example Prompt

Pricing

Billing Rules

Best Use Cases

Pro Tips

Notes

Related Models

Authentication

API Endpoints

Submit Task & Query Result

Parameters

Task Submission Parameters

Request Parameters

Response Parameters

Result Request Parameters

Result Response Parameters