SAM3 Video is a unified foundation model for prompt-based video segmentation. Provide text, point, box, or mask prompts and the model segments and tracks targets across frames with strong temporal consistency. Supports concept-level (“segment anything with concepts”) and multi-object masks for editing, analytics, and VFX. Ready-to-use REST inference API with fast response, no cold starts, and affordable pricing.
Idle
$0.05per run·~20 / $1
The woman
The girl
The perfume
The woman
SAM3 Video (wavespeed-ai/sam3-video) is a prompt-based video segmentation and mask-guided editing model. You provide a video plus a short text instruction (and optionally enable mask application), and the model segments/targets the requested subject(s) across frames with strong temporal consistency.
It’s a practical fit for object-focused video edits like background cleanup, removing unwanted elements, or isolating subjects for downstream compositing—especially on short-to-medium clips with clear subjects.
Prompt-based target selection (concept prompts) Identify what to edit/segment using natural language (e.g., “the woman”, “person”, “red car”) without manually drawing masks frame-by-frame.
Multi-object targeting in one run Track multiple object categories by listing them in the prompt (comma-separated), producing consistent targets across frames.
Mask-guided region control via apply_mask
Toggle whether the model applies the mask to the video output for tighter, more controllable edits.
Temporal consistency for video workflows Designed to keep results stable across frames, reducing flicker/drift compared with per-frame processing.
Editing-oriented use cases Works well for object removal and background cleanup when your prompt clearly specifies what should change and what should stay.
video: (required) Input video file or a public URL.prompt: (required) Text instruction for segmentation/editing. Use commas to target multiple objects (e.g., person, cloth).apply_mask: Whether to apply the mask to the video (boolean). Default: true.Write prompts like you’re describing what to target and (if applicable) what the edit intent is.
Tips:
person, woman, car, dog, shirt.person, backpack, bicycle.Examples:
The womanperson, clothremove the person in the background, keep lighting unchangedProvide video as either:
an uploaded file, or
a public URL the service can fetch.
Pricing/processing uses a billed duration clamp of 5–600 seconds, so very short clips are billed as 5s, and very long clips are treated as 600s.
apply_mask
true: apply the model’s mask to the output video (recommended when you want tighter control over the edited region).
false: run without applying the mask (useful when you want the model’s edits without explicit masking).
After you finish configuring the parameters, click Run, preview the result, and iterate if needed.
Per-run cost depends on video duration (billed duration is clamped to 5–600 seconds), charged in 5-second units at $0.05 per 5s.
| Billed duration | Cost per run |
|---|---|
| 5s | $0.05 |
| 10s | $0.10 |
| 600s (max) | $6.00 |
apply_mask when you need more precise, localized control (especially in cluttered scenes).Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/sam3-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Sam3 Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/sam3-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"video": "https://example.com/your-input.mp4",
"apply_mask": true
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("wavespeed-ai/sam3-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"video": "https://example.com/your-input.mp4",
"apply_mask": true
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"wavespeed-ai/sam3-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"video": "https://example.com/your-input.mp4",
"apply_mask": true
}
)
print(output["outputs"][0]) # → URL of the generated outputSam3 Video is a WaveSpeedAI model for video editing, exposed as a REST API on WaveSpeedAI. SAM3 Video is a unified foundation model for prompt-based video segmentation. Provide text, point, box, or mask prompts and the model segments and tracks targets across frames with strong temporal consistency. Supports concept-level (“segment anything with concepts”) and multi-object masks for editing, analytics, and VFX. Ready-to-use REST inference API with fast response, no cold starts, and affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/sam3-video.
Sam3 Video starts at $0.050 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `video`, `apply_mask`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/sam3-video.
Average end-to-end generation time on WaveSpeedAI is around 32 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.