Hailuo 2.3 T2V Pro | Powerful Text-to-Video API

minimax /

MiniMax Hailuo 2.3 Pro is a text-to-video model delivering 1080p videos with 2.5x efficiency and 85% complex-instruction accuracy. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

Input

Enable Safety Checker

Idle

$0.49per run·~20 / $10

ExamplesView all

Camera: A slow, steady wide shot (as if gently floating) that moves through a dense, lush, sun-dappled forest. The camera pauses slightly as it reveals a small, friendly forest spirit. Effect: Tiny, glowing dust motes (tree spirits / Kodama) slowly drift and sparkle through the shafts of sunlight. Leaves on the trees gently sway in a soft, visible breeze. A small, forest spirit (like a Kodama or Totoro-esque creature) blinks slowly and turns its head, then nods gently to the camera. Sounds/Voices: Soft, ambient forest sounds: the gentle chirping of unseen birds, the distant trickle of water, and the rustling of leaves in the breeze. A delicate, whimsical flute melody plays softly, accompanied by a faint, magical "tinkle" when the spirit nods. Mood: Whimsical, peaceful, magical, enchanting, and serene. A sense of wonder and gentle calm. Lighting: Warm, golden, dappled sunlight filters through the dense tree canopy, creating soft, glowing patches on the forest floor and highlighting the lush greenery. Subtle lens flares appear in the brightest areas.

Camera: A high-angle helicopter/drone shot overlooking a coastal city, shaking violently. The camera pans from the panicking crowds in the streets to the horizon, revealing the approaching wave. Effect: A colossal tsunami wave, as wide as the city itself and hundreds of feet tall, fills the entire horizon. It moves with terrifying speed, violently impacting the outermost buildings, sending water, cars, and debris exploding hundreds of feet into the air. Sounds/Voices: A deafening, low-frequency "ROAR" of the ocean. The piercing sound of city-wide emergency sirens. The massive, crunching, and crashing sounds of thousands of buildings breaking and collapsing. Mood: Utterly terrifying, apocalyptic, unstoppable, and catastrophic. Lighting: Sickly, grey, overcast daylight. The water is a dark, murky blue-green. Visibility is low due to the mist and spray kicked up by the wave.

Camera: A playful 360-degree orbit shot (medium shot) around three dancers in a bright, candy-themed, pastel-colored set. They are smiling and laughing. Effect: As they perform their signature "heart-hands" point dance (a key move), cartoon-style sparkles and small, colorful hearts pop and animate around their hands. Sounds/Voices: Upbeat, bubbly, fast-paced K-pop or J-pop music. A cute "chime" or "boing" sound effect when the sparkles appear. Audible, light giggles from the members. Mood: Joyful, energetic, sweet, playful, and infectious. Lighting: Extremely bright, high-key, shadowless studio lighting. Soft pink, lavender, and mint-green colors flood the set. Warm, glowing lens flares.

Camera: First-person perspective (POV), the beam of a flashlight is the only viewpoint. The camera moves tensely and slowly down a pitch-black, decaying hospital corridor. The camera suddenly jerks to the right. Effect: The flashlight beam only illuminates a few feet ahead, catching dust motes in the air. As the camera jerks right, the beam briefly illuminates a pale face that vanishes in less than half a second. Voices/Sounds: Only the character's shaky, shallow breathing and the distant echo of a single water drop. A short, sharp violin screech (stinger) hits the moment the face appears. Mood: Extreme tension, claustrophobic, jump-scare, deep unease. Lighting: Total darkness, punctuated only by the narrow, cold-white beam of the unstable handheld flashlight.

A detective stands on a rainy street corner, looking down at a mysterious brass compass in his palm. The needle is spinning wildly. Camera pulls back from a close-up of the compass to reveal the detective's puzzled face. Film noir, neon reflections on wet streets, heavy shadows.

Related Models

video-01

image-to-video

voice-design

text-to-audio

voice-clone

audio-to-audio

speech-02-turbo

text-to-audio

speech-02-hd

text-to-audio

hailuo-2.3/fast-pro

image-to-video

README

MiniMax Hailuo 2.3 — Text-to-Video (T2V) Pro

Hailuo 2.3 Pro is the premium text-to-video model from MiniMax, engineered for creators who demand cinematic realism, dynamic motion, and superior visual coherence. It transforms text prompts into richly detailed 5-second 1080p videos — merging professional-grade quality with cutting-edge physical simulation.

🎬 Why It Looks Great

Cinematic Fidelity – Generates ultra-smooth motion, realistic lighting, and lifelike shadows in every frame.
Advanced Physics & Scene Logic – Accurately models object dynamics, reflections, and camera movement.
High Prompt Accuracy – Faithfully interprets natural-language descriptions with exceptional semantic precision.
Consistent Characters – Maintains subject identity and spatial layout throughout the clip.
Refined Aesthetic – Tuned for film-like color grading, depth, and atmosphere.

⚙️ Limits and Performance

Input: text prompt only
Output duration: fixed — 5 seconds
Resolution: up to 1080p
Processing time: approximately 40–70 seconds per job (depending on complexity and queue load)

💰 Pricing

Duration	Resolution	Cost per Job
5 seconds	1080p	$0.49

🚀 How to Use

Write a clear text prompt describing your scene, characters, lighting, and motion. Example: “A traveler walks through a neon-lit rainy street at night, reflections glowing on wet pavement.”
Submit your job — no reference image required.
Wait for processing (typically under 1 minute).
Download your completed 5-second cinematic video.

💡 Pro Tips

Use film-style language — include camera direction (wide shot, slow zoom, tracking).
Mention lighting type (sunset glow, neon reflections, soft cinematic light).
Keep prompts concise (1–2 sentences) for best fidelity.
For stable subjects, include descriptors like same person or consistent background.

Accessibility:This website uses AI models provided by third parties.

ExamplesView all

Related Models

README

MiniMax Hailuo 2.3 — Text-to-Video (T2V) Pro

🎬 Why It Looks Great

⚙️ Limits and Performance

💰 Pricing

🚀 How to Use

💡 Pro Tips

Hailuo 2.3 T2v Pro API — Quick start

Hailuo 2.3 T2v Pro API — Frequently asked questions