Google Veo 3 is Google's flagship image-to-video model that creates audio-enabled videos from images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Idle
$3.2per run
News anchor mid-action, looking straight at the camera. A vintage 1950s black-and-white television broadcast. A serious female news presenter sits at a desk, facing directly toward the audience, with a large old-school microphone in front. She wears a crisp suit, narrow tie, side-parted hair, and wireframe glasses. The presenter moves naturally: leans slightly forward, gestures with one hand, and maintains eye contact with the camera. Her lips are synced to say, "Breaking news: Google Veo 3 is now available on WaveSpeedAI." Contrast, sharp shadows, authentic grainy texture, classic black-and-white 1950s broadcast aesthetic. Vintage TV atmosphere.
A cinematic close-up of a barista crafting latte art in a bustling coffee shop. The scene alternates between her focused, skilled hands and customers watching appreciatively, highlighting the artistry and dedication in everyday routines.
A young woman standing at a balcony at sunrise, overlooking a quiet city. Wind gently rustles her hair. She speaks softly into a mic: 'Another day begins... I wonder what today will bring.
A man walking alone on a forest trail in autumn, leaves crunching underfoot. Birds chirp in the distance. He murmurs: 'Feels good to be away from everything for a while.
A close-up of a pair of hands carefully slicing a ripe mango on a wooden cutting board, golden sunlight streaming through a nearby window. The camera slowly zooms in as the knife glides smoothly through the juicy flesh, juice glistening. Soft lo-fi music plays in the background. Natural lighting, ASMR-style, cinematic depth of field.
A young woman walks alone under a transparent umbrella in a quiet alley during light rain, soft city lights reflecting on the wet pavement. Her pace is calm and thoughtful. The camera follows slowly behind her, occasional droplets hitting the lens. Subtle piano music plays, evoking a melancholic but peaceful mood. Dreamy, cinematic, slightly slow motion.
Static shot. Video. 90s sitcom living room scene. Two people mid-conversation in a colorful, cozy set. The woman smiles and gestures animatedly as she speaks, lips synced to: "Veo 3 generates sound. Dialogues, music, everything!" The man listens attentively, nodding slightly, holding a coffee mug.
Natural light. Field reporter mid-action in an open field, looking directly at the camera, tornado in the background. A reporter, in a muted dark raincoat (gray or navy), stands firmly in a wide, grassy field. The wind pulls at his coat and hair, but he keeps his gaze steady, looking directly into the camera. Behind him, a tornado swirls menacingly under an overcast sky. He speaks clearly, lips synced to: "A tornado is coming, please be safe." Slight handheld movement, unsteady framing, and minor shakes typical of field news footage. Natural, flat daylight with no stylization.
A cinematic documentary-style interview scene. An elderly Asian woman in a warmly lit study full of books and vintage lamps. The woman turns slightly to face the camera directly and says with a voice full of awe and sincerity, "I miss my husband so much." Her voice is aged, raspy yet gentle, filled with emotion and wonder. The lighting is soft and moody, with a shallow depth of field. No subtitles, no titles or overlays. The atmosphere is quiet, respectful, and emotionally powerful, like a heartfelt moment in a serious documentary.
A vibrant 2D animation of a young skateboarder in a colorful outfit performing tricks through a lively city park. Bold lines and bright hues create an energetic, playful atmosphere as the skateboarder maneuvers around obstacles.
Veo 3 I2V is the standard image-to-video version of Google DeepMind’s Veo 3 generative model. It brings still images to life, creating cinematic 1080p videos with smooth, realistic motion, consistent lighting, and synchronized native audio.
From Image to Motion Transform a single image into a natural, dynamic video sequence while preserving its original composition and style.
Cinematic Realism Produces high-fidelity motion with natural lighting, accurate perspective, and fluid camera transitions.
Native Audio Generation Automatically generates synchronized sound—including ambient noise, effects, and light music—perfectly aligned with the visuals.
Dialogue & Lip-Sync Enables speaking characters or realistic expressions, ideal for storytelling, marketing, and short-form content.
Consistent Subject & Style Retains the identity, color tone, and visual integrity of your input image throughout the motion sequence.
| Property | Description |
|---|---|
| Input | Single image + text prompt |
| Max Duration | 8 seconds |
| Resolution | Up to 1080p |
| Audio | Native synchronized dialogue, ambient sound, and music |
| Output Format | MP4 with stereo audio |
Every run needs $3.2 (both 720p and 1080p)
Without audio needs $1.2
✅ Commercial use allowed
Upload an Image Choose a clear, high-quality still image—this defines the subject, framing, and overall style.
Write a Prompt Describe the desired motion, mood, and camera movement.
Example: “Slow cinematic zoom out as wind moves through the trees and sunlight flickers across the leaves.”
Adjust Settings Select the video duration (up to 8 seconds) and output resolution (up to 1080p).
Generate the Video Submit your prompt and image—Veo 3 I2V automatically creates motion, lighting, and audio.
Preview & Download Review the result, refine the prompt if needed, and download the final MP4.
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/google/veo3/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Veo3 Image To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3/image-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "720p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("google/veo3/image-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "720p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"google/veo3/image-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "720p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
}
)
print(output["outputs"][0]) # → URL of the generated outputVeo3 Image To Video is a Google model for video generation from images, exposed as a REST API on WaveSpeedAI. Google Veo 3 is Google's flagship image-to-video model that creates audio-enabled videos from images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/google/google-veo3-image-to-video.
Veo3 Image To Video starts at $3.20 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `image`, `aspect_ratio`, `resolution`, `duration`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/google/google-veo3-image-to-video.
Average end-to-end generation time on WaveSpeedAI is around 120 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Google). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.