Browse ModelsWavespeed AILongcat Avatar 1.5

Longcat Avatar 1.5

Longcat Avatar 1.5

Playground

Try it on WavespeedAI!

LongCat Avatar 1.5 is the upgraded LongCat Avatar with sharper lip sync and faster generation. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), capped at 64 seconds per clip, 720p tier $0.40/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

Features

LongCat Avatar 1.5

LongCat Avatar 1.5 is an upgraded audio-driven avatar video model that turns a single photo into a realistic speaking or singing video. It delivers sharper lip sync, faster generation, natural head and body motion, and strong identity preservation, making it suitable for talking portraits, virtual presenters, short-form content, and avatar-driven storytelling.


Why Choose This?

  • Sharper lip synchronization Aligns mouth movement closely with the input audio for more natural speech and singing performance.

  • Natural full-body coherence Goes beyond lip motion to preserve believable head movement, facial expression, and body posture.

  • Strong identity preservation Maintains facial identity and overall visual consistency across frames.

  • Photo-to-avatar video Turns a static image into a lively speaking or singing performance.

  • Improved generation speed Faster than earlier LongCat avatar workflows while keeping strong realism.

  • Production-ready workflow Suitable for presenter videos, talking portraits, creator content, and personalized avatar clips.


Parameters

ParameterRequiredDescription
audioYesAudio file used to drive the speaking or singing performance.
imageYesSource portrait image of the person to animate.
promptNoPrompt to guide expression, style, pose, or motion behavior.
resolutionNoOutput resolution: 480p or 720p.
seedNoRandom seed for reproducibility. Use a fixed value when you want more consistent results.

How to Use

  1. Upload the audio — provide the voice or singing track to drive the avatar.
  2. Upload the image — add the portrait photo of the person you want to animate.
  3. Add a prompt — guide expression, pose, or visual style if needed.
  4. Choose resolution — select 480p or 720p depending on quality and budget needs.
  5. Set a seed (optional) — use a fixed value for more reproducible generations.
  6. Submit — run the model and download the generated avatar video.

Example Prompt

Natural speaking performance with subtle head movement, calm expression, realistic lip sync, and stable identity.


Pricing

Pricing depends on output length and resolution.

Output ResolutionCost per 5 SecondsMax Billed Length
480p$0.2064 seconds
720p$0.4064 seconds

Billing Rules

  • 480p costs $0.20 per 5 seconds
  • 720p costs $0.40 per 5 seconds
  • Standard rate at 480p is $0.04 per second
  • 720p is the standard rate, or $0.08 per second
  • All videos are billed for a minimum of 5 seconds
  • Billing is capped at 64 seconds
  • Audio longer than 64 seconds is automatically trimmed

Best Use Cases

  • Talking portrait videos — Animate a still portrait into a speaking clip.
  • Virtual presenters — Create avatar-led explainers, intros, and business presentations.
  • Singing avatar content — Turn a portrait into a lip-synced singing performance.
  • Short-form creator videos — Produce avatar content for social media and promotional clips.
  • Personalized spokesperson media — Generate custom speaking videos without filming.

Pro Tips

  • Use a clear, front-facing portrait for better facial stability.
  • Upload clean audio for stronger lip sync and more natural rhythm.
  • Keep prompts simple and focused on expression or motion style.
  • Use 720p when realism matters more, and 480p when you want lower cost.
  • Reuse the same seed when comparing prompt variations on the same portrait.

Notes

  • Maximum clip length per job is 64 seconds
  • Audio longer than 64 seconds is automatically trimmed
  • Generation time is typically around 10–30 seconds of wall time per 1 second of video, depending on resolution and queue load
  • Better source portraits and cleaner audio generally improve overall quality

  • InfiniteTalk — Higher-end avatar video workflow for more advanced speaking performance.
  • InfiniteTalk Multi — Multi-character avatar workflow for more complex scenes.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/longcat-avatar-1.5" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "resolution": "480p",
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
imagestringYes-The image for generating the output.
audiostringYes--The audio for generating the output.
promptstringNo-The positive prompt for the generation.
resolutionstringNo480p480p, 720pThe resolution of the output video.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.