Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only

Minimax Hailuo Models

Minimax Hailuo 2.3 for professional video generation, plus speech synthesis models.

Minimax Hailuo 2.3 for professional video generation, plus speech synthesis models.

All Models

33 models
image-to-video

minimax/video-01

Minimax Video-01 is a text-to-video model offering high compression, strong text responsiveness, cinematic styles, and native HD output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/video-02

Hailuo 02 is an AI video generation model fine-tuned for ultra-clear 1080P output and handling complex physics-driven scenes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-02/standard

Hailuo 02 is an AI video-generation model delivering 768P output with fast responsiveness and strong handling of complex physics scenes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-02/pro

Minimax Hailuo 02 Pro produces ultra-clear 1080P AI videos with responsive, physics-aware rendering for complex physics-driven scenes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

minimax/hailuo-02/t2v-standard

Hailuo 02 is a text-to-video model on MiniMax, fine-tuned to output responsive 768P videos even for complex physics-driven scenes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-02/i2v-standard

Hailuo 02 by Hailuo AI is an image-to-video model delivering ultra-clear 768P video with responsive handling of physics-driven scenes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

minimax/hailuo-02/t2v-pro

Hailuo 02 T2V-Pro is a text-to-video model fine-tuned for ultra-clear 1080P video and responsive handling of physics-driven scenes. Ready-to-use REST API, no coldstarts, best performance, affordable pricing.

image-to-video

minimax/hailuo-02/i2v-pro

MiniMax Hailuo 02 Pro, an image-to-video model tuned for clear 1080P output and responsive handling of complex physics-driven scenes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-02-hd

Minimax Speech 02 HD is Minimax's high-definition text-to-speech model delivering clear HD voices; pricing $0.05 per 1,000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-02-turbo

Minimax Speech-02 Turbo is a high-definition text-to-speech model delivering natural voice output. Cost: $0.03 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/voice-clone

Minimax Voice Clone creates high-quality voice clones from short reference clips, closely matching tone, accent, and speaking style. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/voice-design

MiniMax Voice Design generates natural voices from textual descriptions - no cloning - lets you set tone, accent and personality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-02/fast

Hailuo 02 Fast is a minimax image-to-video model that creates high-quality 6s and 10s clips at 512p for creators and marketers. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-2.5-hd-preview

MiniMax Speech 2.5 HD Preview offers HD TTS with enhanced multilingual expressiveness, accurate voice cloning, and 40-language support. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-2.5-turbo-preview

Minimax Speech 2.5 Turbo Preview: HD TTS with multilingual support, accurate voice replication across 40 languages. $0.04/1000 chars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/music-v1.5

MiniMax Music v1.5 turns text prompts into high-quality, diverse music (Text-to-Audio) using advanced AI for versatile tracks. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/music-01

Minimax Music-01 Synthesizes Accompaniment And Vocals Simultaneously To Produce Complete Songs Across Diverse Styles. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

minimax/hailuo-2.3/t2v-pro

MiniMax Hailuo 2.3 Pro is a text-to-video model delivering 1080p videos with 2.5x efficiency and 85% complex-instruction accuracy. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

minimax/hailuo-2.3/t2v-standard

Hailuo 2.3 is a text-to-video model creating physics-aware 768p videos with 2.5× efficiency and 85% complex instruction response rate. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-2.3/i2v-standard

MiniMax Hailuo 2.3 Standard is an image-to-video model producing physics-aware 768p output with a 2.5x efficiency improvement. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-2.3/i2v-pro

MiniMax Hailuo 2.3 Pro is an image-to-video model for ultra-clear 1080P output and physics-aware scenes with responsive rendering. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-2.3/fast

Hailuo 2.3 Fast by minimax generates high-quality 6s and 10s image-to-video clips at 768p, optimized for creators and marketers. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-2.6-hd

Minimax Speech 2.6 HD: Ultra-human, low-latency (< 250ms) TTS with voice cloning, text normalization and support for 40+ languages. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-2.6-turbo

Minimax Speech 2.6 Turbo is a Text-to-Speech model offering ultra-human voice cloning, industry-leading text normalization, sub-250ms latency and 40+ language support. Pricing: $0.06 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

minimax/hailuo-2.3/fast-pro

Hailuo 2.3 Fast Pro converts images into high-quality 6s 1080p videos, delivering fast, affordable results for creators and marketers. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/music-02

Minimax Music-02 is a compact, fast, cost-effective MoE music generator (230B params, 10B active) for high-quality music production. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

minimax/image-01/image-to-image

MiniMax Image-01 image-to-image model transforms existing images using text prompts. Generate variations, apply style transfers, or modify images with character references. Supports multiple aspect ratios and custom dimensions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

minimax/image-01/text-to-image

MiniMax Image-01 text-to-image model generates high-quality images from text descriptions. Create diverse visuals across multiple styles and scenarios with natural language prompts. Supports multiple aspect ratios and custom dimensions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-2.8-turbo

MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/speech-2.8-hd

MiniMax Speech 2.8 HD is a high-definition text-to-speech model with natural and expressive voice synthesis for premium audio quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/music-2.5

MiniMax Music 2.5 is a full-dimensional breakthrough in AI music generation with high-fidelity audio, humanized vocals, and precise creative control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/music-2.6

MiniMax Music 2.6 generates complete songs with vocals and instrumentals from text prompts and lyrics. Supports instrumental-only mode, auto lyrics generation, structure tags for song arrangement, and configurable audio quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

minimax/music-cover

MiniMax Music Cover transforms existing songs into completely different styles — new arrangement, new vocal character, same melody. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Minimax Hailuo Models

Series Advantages

  1. 1080p native clarity — not upscaled; cleaner detail and steadier temporal coherence.
  2. Strong instruction following — reliable execution of camera moves, lighting, and motion cues.
  3. Physics realism — debris, cloth, water, collisions, and handheld shake feel believable.
  4. Clip-first workflow — 6s / 10s lengths enable fast iteration and easy sequencing.
  5. Two creation modes — pure Text-to-Video (T2V) and Image-to-Video (I2V) across the lineup.

Model Lineup

  1. hailuo-2.3/t2v-standard

Text-to-Video (standard); upgraded from hailuo 02 with smoother motion, cleaner faces, and more stable scene dynamics.

  1. hailuo-2.3/i2v-standard

Image-to-Video (standard); refined transition flow and stronger visual consistency compared to hailuo 02.

  1. hailuo-2.3/t2v-pro

Text-to-Video (pro); higher fidelity and motion realism than standard, with richer detail and better expression control.

  1. hailuo-2.3/i2v-pro

Image-to-Video (pro); enhanced texture depth and temporal coherence for premium production needs.

  1. hailuo-2.3/fast

Fast mode (I2V); optimized for quick generation and batch testing—same model core, faster output.

  1. hailuo-02/standard

Unified endpoint for T2V + I2V; clean visuals and stable timing for everyday production.

  1. hailuo-02/t2v-standard

Text-to-Video (standard); dependable camera motion and physics for scripts, shorts, and explainers.

  1. hailuo-02/i2v-standard

Image-to-Video (standard); lock composition/style with a start image (optional end image) for smooth guided transitions.

  1. hailuo-02/t2v-pro

Text-to-Video (pro); stronger physics, cleaner temporal flow, and higher fidelity for hero shots.

  1. hailuo-02/i2v-pro

Image-to-Video (pro); richer micro-detail and color depth—ideal for animating key visuals and poster-grade stills.

  1. hailuo-02/fast

Fast iteration (T2V/I2V); built for rapid drafts, batch A/B testing, and high-throughput pipelines.

  1. minimax/speech-2.8-turbo

Real-time synthesis (T2A); optimized for ultra-low latency and cost-efficiency—built for conversational AI, live streaming, and instant interaction loops.

  1. minimax/speech-2.8-hd

Studio-grade fidelity (T2A); superior dynamic range and emotional nuance—ideal for audiobooks, cinematic narration, and professional content creation.



Quick guidance: Standard covers most day-to-day needs; choose Pro for hero-quality shots; use Fast for speed and volume.