Enjoy 50% OFF Vidu Q3 & Q3 Pro models • Only on WaveSpeedAI | May 20 – Jun 2

Avatar V Digital Twin

heygen /

HeyGen Avatar V Digital Twin is a fast AI avatar video generation model that creates natural digital twin videos from text or audio with lip-sync, optional captions, background removal, and MP4/WebM output. Ready-to-use REST inference API for digital humans, virtual presenters, product explainers, marketing videos, training content, social media clips, and professional avatar video workflows with simple integration, no coldstarts, and affordable pricing.

digital-human
Input

Drag & drop or click to upload

Remove the avatar background. Requires a matting-enabled avatar.

Idle

$0.12per run·~83 / $10

ExamplesView all

Related Models

README

HeyGen Avatar IV Digital Twin

HeyGen Avatar IV Digital Twin generates a talking avatar video from a selected HeyGen digital twin avatar and an uploaded audio clip. It is designed for presenter videos, spokesperson content, personalized avatar delivery, and other avatar-driven speaking workflows with flexible output controls.

Why Choose This?

  • Digital twin avatar workflow Use a prebuilt digital twin avatar to generate speaking video from audio.

  • Audio-driven speech performance Upload an audio clip to drive the avatar’s timing, expression, and speaking delivery.

  • Flexible framing controls Choose aspect ratio, fit mode, and output resolution to match your target platform.

  • Optional background removal Enable remove_background for supported matting-enabled avatars.

  • Optional caption export Enable caption to generate a sidecar SRT subtitle file alongside the video.

  • Production-ready API Suitable for personalized presenter videos, internal communications, ads, explainers, and virtual spokesperson workflows.

Parameters

ParameterRequiredDescription
avatarYesSelected HeyGen digital twin avatar.
audioYesAudio clip used to drive the avatar video.
fitNoHow the avatar is framed in the output, such as cover.
remove_backgroundNoRemove the avatar background. Requires a matting-enabled avatar.
captionNoGenerate a sidecar SRT caption file alongside the video.
output_formatNoOutput video format. Default: mp4.
resolutionNoOutput resolution, such as 720p.
aspect_ratioNoOutput aspect ratio, such as 16:9.

How to Use

  1. Choose your avatar — select the digital twin avatar you want to use.
  2. Upload your audio — provide the voice track that should drive the avatar.
  3. Adjust framing (optional) — choose fit, resolution, and aspect_ratio.
  4. Enable extras (optional) — turn on remove_background or caption if needed.
  5. Submit — run the model and download the generated avatar video.

Example Use Case

Generate a polished office presenter video from a digital twin avatar and a short voice clip for internal announcements or marketing content.

Pricing

Pricing is based on the uploaded audio duration.

Audio DurationCost
5s$0.60
6s$0.72
7s$0.84
8s$0.96
10s$1.20
15s$1.80

Billing Rules

  • Base price is $0.12 per second
  • Minimum billed duration is 5 seconds
  • Audio duration is rounded up to the next whole second

Best Use Cases

  • Digital spokesperson videos — Create branded speaking-avatar content quickly.
  • Internal communications — Deliver announcements, updates, and training clips with a consistent avatar.
  • Marketing and ads — Produce talking-head promo content without filming.
  • Explainers and onboarding — Turn voice scripts into presenter-led video.
  • Localized delivery — Reuse the same avatar with different voice tracks and captions.

Pro Tips

  • Upload clean audio for better speaking rhythm and lip-sync quality.
  • Keep clips short while testing avatar style and framing.
  • Use caption when the final video needs accessible subtitles.
  • Only enable remove_background if the selected avatar supports matting.
  • Match aspect_ratio to your final platform, such as 16:9 for widescreen delivery.

Notes

  • avatar and audio are required.
  • Billing uses the uploaded audio duration, with a minimum of 5 seconds.
  • Audio duration is rounded up to the next whole second before billing.
  • remove_background requires a matting-enabled avatar.
  • caption generates a sidecar SRT file, not burned-in subtitles.

Related Models

  • HeyGen Avatar IV avatar workflows — Useful when you need other avatar generation modes or delivery options.
  • Talking avatar video workflows — Useful when you want image-based or non-digital-twin avatar generation instead of a preset HeyGen twin.
  • Subtitle and caption workflows — Useful when you need more advanced subtitle styling or burn-in options.
Accessibility:This website uses AI models provided by third parties.

Avatar V Digital Twin API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/heygen/avatar-v/digital-twin with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Avatar V Digital Twin below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/heygen/avatar-v/digital-twin" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "avatar": "Abigail Sofa Front",
    "audio": "https://example.com/your-audio.mp3",
    "fit": "cover",
    "remove_background": false,
    "output_format": "mp4",
    "resolution": "720p",
    "aspect_ratio": "16:9"
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("heygen/avatar-v/digital-twin", {
        "avatar": "Abigail Sofa Front",
        "audio": "https://example.com/your-audio.mp3",
        "fit": "cover",
        "remove_background": false,
        "output_format": "mp4",
        "resolution": "720p",
        "aspect_ratio": "16:9"
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "heygen/avatar-v/digital-twin",
    {
    "avatar": "Abigail Sofa Front",
    "audio": "https://example.com/your-audio.mp3",
    "fit": "cover",
    "remove_background": false,
    "output_format": "mp4",
    "resolution": "720p",
    "aspect_ratio": "16:9"
}
)

print(output["outputs"][0])  # → URL of the generated output

Avatar V Digital Twin API — Frequently asked questions

What is the Avatar V Digital Twin API?

Avatar V Digital Twin is a Heygen model for talking-avatar generation, exposed as a REST API on WaveSpeedAI. HeyGen Avatar V Digital Twin is a fast AI avatar video generation model that creates natural digital twin videos from text or audio with lip-sync, optional captions, background removal, and MP4/WebM output. Ready-to-use REST inference API for digital humans, virtual presenters, product explainers, marketing videos, training content, social media clips, and professional avatar video workflows with simple integration, no coldstarts, and affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Avatar V Digital Twin API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/heygen/heygen-avatar-v-digital-twin.

How much does Avatar V Digital Twin cost per run?

Avatar V Digital Twin starts at $0.12 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Avatar V Digital Twin accept?

Key inputs: `audio`, `aspect_ratio`, `resolution`, `avatar`, `fit`, `output_format`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/heygen/heygen-avatar-v-digital-twin.

How do I get started with the Avatar V Digital Twin API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Avatar V Digital Twin outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Heygen). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.