Veo 3.1 API
Google Veo 3.1 — text-to-video with synchronized native audio at 1080p. Three tiers (Standard, Fast, Lite) with text-to-video, image-to-video, reference-to-video, and video-extend, plus start-end-to-video on the Lite tier.
Standard tier delivers Veo's full quality; Fast tier for iteration; Lite tier for previz and high-volume work. Reference-to-video accepts reference images for identity-preserving generation; video-extend works in 7-second steps producing a single merged clip.
About the Veo 3.1 API
What Veo 3.1 does, how it fits in the Google model lineup, and why teams reach for it.
Veo 3.1 is a video generation model from Google, available through the WaveSpeedAI REST API. Google Veo 3.1 — text-to-video with synchronized native audio at 1080p. Three tiers (Standard, Fast, Lite) with text-to-video, image-to-video, reference-to-video, and video-extend, plus start-end-to-video on the Lite tier.
Standard tier delivers Veo's full quality; Fast tier for iteration; Lite tier for previz and high-volume work. Reference-to-video accepts reference images for identity-preserving generation; video-extend works in 7-second steps producing a single merged clip.
The Veo 3.1 family on WaveSpeedAI ships 11 REST endpoints covering Text-To-Video, Video-Extend, Image-To-Video workflows. Each variant carries its own pricing, parameter knobs, and example outputs — pick the one that matches your input modality and production constraints, or call several from the same API key to compose multi-step pipelines.
Run Veo 3.1 through the same API key, billing account, and rate-limit envelope you use for the other 1,000+ AI models on WaveSpeedAI. No separate vendor setup, no per-provider SDKs, no per-vendor rate-limit envelopes — one integration covers everything from text-to-image and text-to-video through audio synthesis, 3D generation, upscaling, and editing.
All Veo 3.1 API endpoints
11 endpoints available now on WaveSpeedAI — pick the variant that matches your workflow.

Text To Video
Google Veo 3.1 Lite Text-to-Video generates high-fidelity 720p or 1080p videos with natively generated audio from text prompts. Lightweight variant optimized for cost efficiency. Supports landscape and portrait aspect ratios, dialogue with lip-sync, and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Video Extend
Extend and continue Veo 3.1 videos with smooth motion, preserved style, and strong scene coherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Video Extend (Fast)
Extend Veo 3.1 videos in 7-second steps with the Fast endpoint—quick, coherent continuation that preserves style and motion, output as a single merged clip. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Text To Video (Fast)
Google Veo 3.1 Fast creates text-to-video with native 1080p and synchronized audio, delivering high-quality videos for creators. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Reference To Video
Google Veo3.1 Reference-to-Video performs image-to-video generation that preserves a specific subject's appearance and identity from provided reference images, enabling consistent character or product motion across frames. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Start End To Video
Google Veo 3.1 Lite Start-End-to-Video generates high-fidelity videos by interpolating between a start image and an optional end image. Supports 720p and 1080p resolutions, landscape and portrait aspect ratios, and native audio generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Text To Video
Google Veo 3.1 converts text prompts into videos with synchronized audio at native 1080p for high-quality outputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Image To Video
Google Veo 3.1 Lite Image-to-Video transforms static images into high-fidelity 720p or 1080p videos with natively generated audio. Supports many interpolation use cases, landscape and portrait aspect ratios, and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Image To Video (Fast)
Google Veo 3.1 Fast is an Image-to-Video model with native 1080p output for high-detail videos from images and fast performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Image To Video
Google Veo 3.1 is an Image-to-Video model that converts images into high-quality videos with native 1080P output for enhanced detail and creative flexibility. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Reference To Video (Fast)
Google Veo 3.1 Fast Reference to Video is a fast AI reference-to-video generation model that creates 8-second videos from up to three reference images using the official Veo predictLongRunning endpoint with referenceImages assets. Ready-to-use REST inference API for product videos, character consistency, branded visual storytelling, social media clips, advertising creatives, and professional reference-based video generation workflows with simple integration, no coldstarts, and affordable pricing.
See Veo 3.1 in action
Real outputs generated by the Veo 3.1 API. Hover any video to preview, click to open the full-size viewer.
How to use the Veo 3.1 API
Four steps from signup to a finished generation. Full Python, Node.js, and cURL examples are in the API section below.
- 1
Get an API key
Sign up for a WaveSpeedAI account and copy your API key from the dashboard. New accounts come with free starter credits — enough to run the playground a few dozen times before billing kicks in.
- 2
Submit a prediction
POST your input as JSON to https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video. The endpoint returns a prediction id immediately — generations are async so you don't hold an open connection during inference.
- 3
Poll for completion
GET https://api.wavespeed.ai/api/v3/predictions/{request_id}/result every 1-2 seconds. The response includes a status field; keep polling until it flips from"queued" or"processing" to"completed".
- 4
Read the output URL
Once status is"completed", read the URL from data.outputs[0]. The URL points to your generated media on the WaveSpeedAI CDN — image, video, audio, or 3D file depending on the Veo 3.1 variant you called.
What you can build with Veo 3.1
Common workflows developers and creators use the Veo 3.1 API for.
Text-to-video with native 1080p audio
google/veo3.1/text-to-video converts text prompts into videos with synchronized audio at native 1080p — delivery-ready resolution without an upscaling pass. Same prompt format works across Standard, Fast, and Lite tiers.
Reference-to-video for identity preservation
google/veo3.1/reference-to-video performs image-to-video while preserving a specific subject's appearance from provided reference images. Fast-tier reference-to-video uses Veo's official predictLongRunning endpoint and supports up to three reference images for 8-second outputs.
Tiered pricing: Standard / Fast / Lite
Standard for delivery, Fast for iteration, Lite for previz and high-volume work. Lite ships quality trade-offs; switch tiers via the endpoint URL without re-writing prompts.
Video-extend in 7-second steps
google/veo3.1/video-extend continues existing Veo clips with smooth motion and preserved style — outputs are a single merged clip rather than separate segments to stitch. Fast-tier video-extend (/extension) is the cheaper option for iterative extension.
Start-end interpolation (Lite tier)
google/veo3.1-lite/start-end-to-video interpolates between a start image and an optional end image to generate the connecting motion. Supports 720p and 1080p, landscape and portrait aspect ratios. Useful for animatic and keyframe-driven workflows at the Lite price point.
Image-to-video at native 1080p
google/veo3.1/image-to-video converts images into 1080p videos with enhanced detail and creative flexibility — the right pick when starting from a key still rather than a text-only brief.
Tips for prompting Veo 3.1
Practical advice for getting better outputs from Veo 3.1 — drawn from the patterns that work across video models in production pipelines.
Be specific about camera moves
Mention concrete cinematography vocabulary — orbit, dolly-in, push-in, pan-left, crane shot, handheld follow. Generic prompts produce static or arbitrary camera choices; named camera moves map directly to motion intent in the model's training data and dramatically improve shot quality.
Anchor character identity with reference images
If your prompt depends on a specific person, character, or product, upload a reference image alongside the prompt. Without a reference, identity drifts across frames and across shots — the same character ends up looking like a slightly different person each generation.
Describe lighting and time of day
Lighting cues like 'golden hour, soft warm directional light' or 'overcast diffused light, slate-grey sky' improve quality and consistency far more than vague quality modifiers. Lighting is one of the strongest priors the model conditions on.
Use negative prompts to suppress common failure modes
Useful negatives for video: 'frame flicker, motion blur, watermark, text artifacts, distorted hands, low resolution, jpeg compression'. Negative prompts cost nothing and noticeably reduce the rate of generations you'd otherwise re-roll.
Pick the shortest duration that captures your beat
Most prompts work best at 5-8 seconds. Longer clips amplify temporal inconsistencies (subject morphing, environment drift). If you need a 20-second sequence, generate three 6-8 second clips and edit them together — quality stays higher than one long generation.
Match aspect ratio to platform up front
9:16 for TikTok / Reels / Shorts, 16:9 for landscape feeds and YouTube, 1:1 for post grids. Models train slightly differently per aspect ratio — cropping a 16:9 to 9:16 after the fact loses both fidelity and the composition the model intended.
Veo 3.1 API pricing
Pricing is per-output. The final charge scales with the parameters you set in each variant's playground (resolution, duration, output count, references).
| Endpoint | Type | Starting price |
|---|---|---|
| google/veo3.1-lite/text-to-video | text-to-video | $0.30 |
| google/veo3.1/video-extend | video-extend | $2.80 |
| google/veo3.1-fast/video-extend | video-extend | $1.05 |
| google/veo3.1-fast/text-to-video | text-to-video | $1.20 |
| google/veo3.1/reference-to-video | image-to-video | $3.20 |
| google/veo3.1-lite/start-end-to-video | image-to-video | $0.40 |
| google/veo3.1/text-to-video | text-to-video | $3.20 |
| google/veo3.1-lite/image-to-video | image-to-video | $0.30 |
| google/veo3.1-fast/image-to-video | image-to-video | $1.20 |
| google/veo3.1/image-to-video | image-to-video | $3.20 |
| google/veo3.1-fast/reference-to-video | image-to-video | $0.64 |
Call the Veo 3.1 API
Sign up for an API key at wavespeed.ai/accesskey, then submit a prediction via REST. The playground generates ready-to-paste samples for any combination of inputs.
HTTP example
# 1. Submit a prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{}'
# 2. Poll the result until status = "completed"
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# Read the output URL from data.outputs[0].Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY
const result = await client.run("google/veo3.1/text-to-video", {});
console.log(result.outputs[0]); // → URL of the generated outputPython example
# pip install wavespeed
import wavespeed
output = wavespeed.run(
"google/veo3.1/text-to-video",
{}
)
print(output["outputs"][0]) # → URL of the generated outputVeo 3.1 vs alternatives
When to pick Veo 3.1 over similar models on WaveSpeedAI.
Veo 3.1 vs Seedance 2.0
Seedance 2.0 ships native audio across every tier and the Turbo tier (1080p at near-480p speed). Veo 3.1 is more expensive at the top tier, but the Lite tier is competitive on cost, and Veo's photorealism reputation is stronger for human faces.
Veo 3.1 vs Kling 3.0
Kling 3.0 has Pro and 4K tiers plus a motion-control endpoint. Veo 3.1 ships three pricing tiers (Standard / Fast / Lite) with reference-to-video and start-end interpolation as first-class endpoints — different feature surface, different feature surfaces.
Veo 3.1 vs Wan 2.7
Wan 2.7 has reference-to-video, image-edit, and text-to-image variants in one family — broader cross-modal toolkit. Veo 3.1's Standard tier covers tiered cost optimization (Lite through Standard) and 7-second video-extend that Wan handles differently.
Veo 3.1 API — Frequently asked questions
Pricing, license, integration — common questions about running Veo 3.1 on WaveSpeedAI.
What is the Veo 3.1 API?
Veo 3.1 is a Google video generation model exposed as a REST API on WaveSpeedAI. Google Veo 3.1 — text-to-video with synchronized native audio at 1080p. Three tiers (Standard, Fast, Lite) with text-to-video, image-to-video, reference-to-video, and video-extend, plus start-end-to-video on the Lite tier. You can call it programmatically or try it from the playground linked above.
How do I call the Veo 3.1 API?
Sign up for a WaveSpeedAI account, copy your API key from /accesskey, then POST to https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to "completed", then read the output URL from data.outputs[0]. Full Python / Node.js / cURL examples are above.
How much does the Veo 3.1 API cost?
Veo 3.1 starts at $0.30 per run. The exact cost scales with the parameters you set (resolution, duration, output count, references). The live cost preview next to the Generate button in the playground shows the exact price for your current input.
Which Veo 3.1 variants are available?
WaveSpeedAI hosts 11 Veo 3.1 endpoints: google/veo3.1-lite/text-to-video, google/veo3.1/video-extend, google/veo3.1-fast/video-extend, google/veo3.1-fast/text-to-video, google/veo3.1/reference-to-video, google/veo3.1-lite/start-end-to-video, google/veo3.1/text-to-video, google/veo3.1-lite/image-to-video, and more. Each variant has its own playground page and pricing.
Can I use Veo 3.1 outputs commercially?
Commercial usage rights follow the Google model license. Most Google models permit commercial output use; see each model's playground page for the specific license summary, and WaveSpeedAI's Terms of Service for platform-level conditions.
Why use Veo 3.1 on WaveSpeedAI instead of going direct?
One API key + one billing account across Veo 3.1 AND 1,000+ other AI models from other providers. No per-vendor SDK setup, no separate rate-limit envelopes, no rewrite-per-vendor integration code. Pricing is typically at parity with or below Google's direct API.
About Google
The team behind Veo 3.1 and the broader Google model lineup on WaveSpeedAI.
Google's AI work happens primarily at Google DeepMind and Google Research. Its image and video models — Imagen, Veo, and Gemini-family multimodal models like Nano Banana (Gemini 3 Image) — share architecture and training infrastructure with the broader Gemini lineup. Outputs are noted for accurate text rendering, broad style coverage, and commercial-grade licensing.
Start building with Veo 3.1 on WaveSpeedAI
Free starter credits on signup. One API key across 1,000+ AI models from Google and every other provider.