Google·video·From $0.30/run

Veo 3.1 API

Google Veo 3.1 — text-to-video with synchronized native audio at 1080p. Three tiers (Standard, Fast, Lite) with text-to-video, image-to-video, reference-to-video, and video-extend, plus start-end-to-video on the Lite tier.

Standard tier delivers Veo's full quality; Fast tier for iteration; Lite tier for previz and high-volume work. Reference-to-video accepts reference images for identity-preserving generation; video-extend works in 7-second steps producing a single merged clip.

Open Playground →View API Docs

About the Veo 3.1 API

What Veo 3.1 does, how it fits in the Google model lineup, and why teams reach for it.

Veo 3.1 is a video generation model from Google, available through the WaveSpeedAI REST API. Google Veo 3.1 — text-to-video with synchronized native audio at 1080p. Three tiers (Standard, Fast, Lite) with text-to-video, image-to-video, reference-to-video, and video-extend, plus start-end-to-video on the Lite tier.

Standard tier delivers Veo's full quality; Fast tier for iteration; Lite tier for previz and high-volume work. Reference-to-video accepts reference images for identity-preserving generation; video-extend works in 7-second steps producing a single merged clip.

The Veo 3.1 family on WaveSpeedAI ships 11 REST endpoints covering Text-To-Video, Video-Extend, Image-To-Video workflows. Each variant carries its own pricing, parameter knobs, and example outputs — pick the one that matches your input modality and production constraints, or call several from the same API key to compose multi-step pipelines.

Run Veo 3.1 through the same API key, billing account, and rate-limit envelope you use for the other 1,000+ AI models on WaveSpeedAI. No separate vendor setup, no per-provider SDKs, no per-vendor rate-limit envelopes — one integration covers everything from text-to-image and text-to-video through audio synthesis, 3D generation, upscaling, and editing.

All Veo 3.1 API endpoints

11 endpoints available now on WaveSpeedAI — pick the variant that matches your workflow.

Text To Video

Google Veo 3.1 Lite Text-to-Video generates high-fidelity 720p or 1080p videos with natively generated audio from text prompts. Lightweight variant optimized for cost efficiency. Supports landscape and portrait aspect ratios, dialogue with lip-sync, and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-videofrom $0.30

Video Extend

Extend and continue Veo 3.1 videos with smooth motion, preserved style, and strong scene coherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-extendfrom $2.80

Video Extend (Fast)

Extend Veo 3.1 videos in 7-second steps with the Fast endpoint—quick, coherent continuation that preserves style and motion, output as a single merged clip. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-extendfrom $1.05

Text To Video (Fast)

Google Veo 3.1 Fast creates text-to-video with native 1080p and synchronized audio, delivering high-quality videos for creators. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-videofrom $1.20

Reference To Video

Google Veo3.1 Reference-to-Video performs image-to-video generation that preserves a specific subject's appearance and identity from provided reference images, enabling consistent character or product motion across frames. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $3.20

Start End To Video

Google Veo 3.1 Lite Start-End-to-Video generates high-fidelity videos by interpolating between a start image and an optional end image. Supports 720p and 1080p resolutions, landscape and portrait aspect ratios, and native audio generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.40

Text To Video

Google Veo 3.1 converts text prompts into videos with synchronized audio at native 1080p for high-quality outputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-videofrom $3.20

Image To Video

Google Veo 3.1 Lite Image-to-Video transforms static images into high-fidelity 720p or 1080p videos with natively generated audio. Supports many interpolation use cases, landscape and portrait aspect ratios, and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.30

Image To Video (Fast)

Google Veo 3.1 Fast is an Image-to-Video model with native 1080p output for high-detail videos from images and fast performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $1.20

Image To Video

Google Veo 3.1 is an Image-to-Video model that converts images into high-quality videos with native 1080P output for enhanced detail and creative flexibility. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $3.20

Reference To Video (Fast)

Google Veo 3.1 Fast Reference to Video is a fast AI reference-to-video generation model that creates 8-second videos from up to three reference images using the official Veo predictLongRunning endpoint with referenceImages assets. Ready-to-use REST inference API for product videos, character consistency, branded visual storytelling, social media clips, advertising creatives, and professional reference-based video generation workflows with simple integration, no coldstarts, and affordable pricing.

image-to-videofrom $0.64

See Veo 3.1 in action

Real outputs generated by the Veo 3.1 API. Hover any video to preview, click to open the full-size viewer.

How to use the Veo 3.1 API

Four steps from signup to a finished generation. Full Python, Node.js, and cURL examples are in the API section below.

1
Get an API key
Sign up for a WaveSpeedAI account and copy your API key from the dashboard. New accounts come with free starter credits — enough to run the playground a few dozen times before billing kicks in.
2
Submit a prediction
POST your input as JSON to https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video. The endpoint returns a prediction id immediately — generations are async so you don't hold an open connection during inference.
3
Poll for completion
GET https://api.wavespeed.ai/api/v3/predictions/{request_id}/result every 1-2 seconds. The response includes a status field; keep polling until it flips from"queued" or"processing" to"completed".
4
Read the output URL
Once status is"completed", read the URL from data.outputs[0]. The URL points to your generated media on the WaveSpeedAI CDN — image, video, audio, or 3D file depending on the Veo 3.1 variant you called.

What you can build with Veo 3.1

Common workflows developers and creators use the Veo 3.1 API for.

Text-to-video with native 1080p audio

google/veo3.1/text-to-video converts text prompts into videos with synchronized audio at native 1080p — delivery-ready resolution without an upscaling pass. Same prompt format works across Standard, Fast, and Lite tiers.

text-to-videoaudio1080p

Reference-to-video for identity preservation

google/veo3.1/reference-to-video performs image-to-video while preserving a specific subject's appearance from provided reference images. Fast-tier reference-to-video uses Veo's official predictLongRunning endpoint and supports up to three reference images for 8-second outputs.

referenceidentitycharacter

Tiered pricing: Standard / Fast / Lite

Standard for delivery, Fast for iteration, Lite for previz and high-volume work. Lite ships quality trade-offs; switch tiers via the endpoint URL without re-writing prompts.

tierspricingiteration

Video-extend in 7-second steps

google/veo3.1/video-extend continues existing Veo clips with smooth motion and preserved style — outputs are a single merged clip rather than separate segments to stitch. Fast-tier video-extend (/extension) is the cheaper option for iterative extension.

video-extendlong-form7-second

Start-end interpolation (Lite tier)

google/veo3.1-lite/start-end-to-video interpolates between a start image and an optional end image to generate the connecting motion. Supports 720p and 1080p, landscape and portrait aspect ratios. Useful for animatic and keyframe-driven workflows at the Lite price point.

interpolationkeyframesanimatic

Image-to-video at native 1080p

google/veo3.1/image-to-video converts images into 1080p videos with enhanced detail and creative flexibility — the right pick when starting from a key still rather than a text-only brief.

image-to-video1080pimage-driven

Tips for prompting Veo 3.1

Practical advice for getting better outputs from Veo 3.1 — drawn from the patterns that work across video models in production pipelines.

Be specific about camera moves

Mention concrete cinematography vocabulary — orbit, dolly-in, push-in, pan-left, crane shot, handheld follow. Generic prompts produce static or arbitrary camera choices; named camera moves map directly to motion intent in the model's training data and dramatically improve shot quality.

Anchor character identity with reference images

If your prompt depends on a specific person, character, or product, upload a reference image alongside the prompt. Without a reference, identity drifts across frames and across shots — the same character ends up looking like a slightly different person each generation.

Describe lighting and time of day

Lighting cues like 'golden hour, soft warm directional light' or 'overcast diffused light, slate-grey sky' improve quality and consistency far more than vague quality modifiers. Lighting is one of the strongest priors the model conditions on.

Use negative prompts to suppress common failure modes

Useful negatives for video: 'frame flicker, motion blur, watermark, text artifacts, distorted hands, low resolution, jpeg compression'. Negative prompts cost nothing and noticeably reduce the rate of generations you'd otherwise re-roll.

Pick the shortest duration that captures your beat

Most prompts work best at 5-8 seconds. Longer clips amplify temporal inconsistencies (subject morphing, environment drift). If you need a 20-second sequence, generate three 6-8 second clips and edit them together — quality stays higher than one long generation.

Match aspect ratio to platform up front

9:16 for TikTok / Reels / Shorts, 16:9 for landscape feeds and YouTube, 1:1 for post grids. Models train slightly differently per aspect ratio — cropping a 16:9 to 9:16 after the fact loses both fidelity and the composition the model intended.

Veo 3.1 API pricing

Pricing is per-output. The final charge scales with the parameters you set in each variant's playground (resolution, duration, output count, references).

Endpoint	Type	Starting price
google/veo3.1-lite/text-to-video	text-to-video	$0.30
google/veo3.1/video-extend	video-extend	$2.80
google/veo3.1-fast/video-extend	video-extend	$1.05
google/veo3.1-fast/text-to-video	text-to-video	$1.20
google/veo3.1/reference-to-video	image-to-video	$3.20
google/veo3.1-lite/start-end-to-video	image-to-video	$0.40
google/veo3.1/text-to-video	text-to-video	$3.20
google/veo3.1-lite/image-to-video	image-to-video	$0.30
google/veo3.1-fast/image-to-video	image-to-video	$1.20
google/veo3.1/image-to-video	image-to-video	$3.20
google/veo3.1-fast/reference-to-video	image-to-video	$0.64

Call the Veo 3.1 API

Sign up for an API key at wavespeed.ai/accesskey, then submit a prediction via REST. The playground generates ready-to-paste samples for any combination of inputs.

HTTP example

# 1. Submit a prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{}'

# 2. Poll the result until status = "completed"
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# Read the output URL from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY

const result = await client.run("google/veo3.1/text-to-video", {});
console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "google/veo3.1/text-to-video",
    {}
)
print(output["outputs"][0])  # → URL of the generated output

Veo 3.1 vs alternatives

When to pick Veo 3.1 over similar models on WaveSpeedAI.

Veo 3.1 vs Seedance 2.0

Seedance 2.0 ships native audio across every tier and the Turbo tier (1080p at near-480p speed). Veo 3.1 is more expensive at the top tier, but the Lite tier is competitive on cost, and Veo's photorealism reputation is stronger for human faces.

Veo 3.1 vs Kling 3.0

Kling 3.0 has Pro and 4K tiers plus a motion-control endpoint. Veo 3.1 ships three pricing tiers (Standard / Fast / Lite) with reference-to-video and start-end interpolation as first-class endpoints — different feature surface, different feature surfaces.

Veo 3.1 vs Wan 2.7

Wan 2.7 has reference-to-video, image-edit, and text-to-image variants in one family — broader cross-modal toolkit. Veo 3.1's Standard tier covers tiered cost optimization (Lite through Standard) and 7-second video-extend that Wan handles differently.

Veo 3.1 API — Frequently asked questions

Pricing, license, integration — common questions about running Veo 3.1 on WaveSpeedAI.

What is the Veo 3.1 API?

Veo 3.1 is a Google video generation model exposed as a REST API on WaveSpeedAI. Google Veo 3.1 — text-to-video with synchronized native audio at 1080p. Three tiers (Standard, Fast, Lite) with text-to-video, image-to-video, reference-to-video, and video-extend, plus start-end-to-video on the Lite tier. You can call it programmatically or try it from the playground linked above.

How do I call the Veo 3.1 API?

Sign up for a WaveSpeedAI account, copy your API key from /accesskey, then POST to https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to "completed", then read the output URL from data.outputs[0]. Full Python / Node.js / cURL examples are above.

How much does the Veo 3.1 API cost?

Veo 3.1 starts at $0.30 per run. The exact cost scales with the parameters you set (resolution, duration, output count, references). The live cost preview next to the Generate button in the playground shows the exact price for your current input.

Which Veo 3.1 variants are available?

WaveSpeedAI hosts 11 Veo 3.1 endpoints: google/veo3.1-lite/text-to-video, google/veo3.1/video-extend, google/veo3.1-fast/video-extend, google/veo3.1-fast/text-to-video, google/veo3.1/reference-to-video, google/veo3.1-lite/start-end-to-video, google/veo3.1/text-to-video, google/veo3.1-lite/image-to-video, and more. Each variant has its own playground page and pricing.

Can I use Veo 3.1 outputs commercially?

Commercial usage rights follow the Google model license. Most Google models permit commercial output use; see each model's playground page for the specific license summary, and WaveSpeedAI's Terms of Service for platform-level conditions.

Why use Veo 3.1 on WaveSpeedAI instead of going direct?

One API key + one billing account across Veo 3.1 AND 1,000+ other AI models from other providers. No per-vendor SDK setup, no separate rate-limit envelopes, no rewrite-per-vendor integration code. Pricing is typically at parity with or below Google's direct API.

About Google

The team behind Veo 3.1 and the broader Google model lineup on WaveSpeedAI.

Google's AI work happens primarily at Google DeepMind and Google Research. Its image and video models — Imagen, Veo, and Gemini-family multimodal models like Nano Banana (Gemini 3 Image) — share architecture and training infrastructure with the broader Gemini lineup. Outputs are noted for accurate text rendering, broad style coverage, and commercial-grade licensing.

Related model APIs on WaveSpeedAI

Other AI APIs from Google and the rest of the video model lineup — one API key, one billing account.

Nano Banana Pro API

Google

Google Nano Banana Pro (Gemini 3.0 Pro Image) — high-res 4K text-to-image and image editing optimized for phones. Standard, Ultra (higher-res), and Multi (multi-output) variants for both generation and edit.

Nano Banana 2 API

Google

Google Nano Banana 2 (Gemini 3.1 Flash Image) — Pro-quality image generation at Flash speed. 512px to 4K resolution, improved text rendering, character consistency for up to 5 characters, and real-world knowledge integration.

Seedance 2.0 API

ByteDance

ByteDance Seedance 2.0 — Hollywood-grade cinematic video with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture.

Seedance 1.5 Pro API

ByteDance

ByteDance Seedance 1.5 Pro — cinematic, live-action-leaning clips with strong prompt adherence, expressive motion, and stable aesthetics. 4-12s duration with Smart Duration, multiple aspect ratios, reproducible generation via seeds.

Wan 2.7 API

Alibaba

Alibaba WAN 2.7 — coherent cinematic video with crisp detail, stable motion, and strong instruction-following. Separate endpoints for text-to-video, image-to-video, reference-to-video, video-edit, video-extend, plus image-edit and text-to-image variants in the same family.

Happy Horse 1.0 API

Alibaba

Alibaba Happy Horse 1.0 — cinematic 720p / 1080p video with smooth camera movement, expressive motion, and strong prompt fidelity. Includes reference-to-video for consistent character/style identity across generations.

Start building with Veo 3.1 on WaveSpeedAI

Free starter credits on signup. One API key across 1,000+ AI models from Google and every other provider.

Open Veo 3.1 Playground →Get an API Key

Veo 3.1 API

About the Veo 3.1 API

All Veo 3.1 API endpoints

Text To Video

Video Extend

Video Extend (Fast)

Text To Video (Fast)

Reference To Video

Start End To Video

Text To Video

Image To Video

Image To Video (Fast)

Image To Video

Reference To Video (Fast)

See Veo 3.1 in action

How to use the Veo 3.1 API

Get an API key

Submit a prediction

Poll for completion

Read the output URL

What you can build with Veo 3.1

Text-to-video with native 1080p audio

Reference-to-video for identity preservation

Tiered pricing: Standard / Fast / Lite

Video-extend in 7-second steps

Start-end interpolation (Lite tier)

Image-to-video at native 1080p

Tips for prompting Veo 3.1

Be specific about camera moves

Anchor character identity with reference images

Describe lighting and time of day

Use negative prompts to suppress common failure modes

Pick the shortest duration that captures your beat

Match aspect ratio to platform up front

Veo 3.1 API pricing

Call the Veo 3.1 API

Veo 3.1 vs alternatives

Veo 3.1 vs Seedance 2.0

Veo 3.1 vs Kling 3.0

Veo 3.1 vs Wan 2.7

Veo 3.1 API — Frequently asked questions

About Google

Related model APIs on WaveSpeedAI

Nano Banana Pro API

Nano Banana 2 API

Seedance 2.0 API

Seedance 1.5 Pro API

Wan 2.7 API

Happy Horse 1.0 API

Start building with Veo 3.1 on WaveSpeedAI