Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only

Qwen AI Models

Qwen multimodal models for image and video generation

Qwen multimodal models for image and video generation

All Models

33 models
text-to-image

wavespeed-ai/qwen-image/text-to-image

Qwen-Image is a 20B MMDiT next-gen text-to-image model that generates images from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/qwen-image/text-to-image-lora

Qwen-Image LoRA is a 20B MMDiT next-gen text-to-image model with LoRA support for fast customization and refined image generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

training

wavespeed-ai/qwen-image-lora-trainer

Train custom Qwen-Image LoRA models 10x faster. Style training, character training, object training. From concept to model in minutes, not hours. Upload a ZIP file containing images to start!

image-to-image

wavespeed-ai/qwen-image/edit

Qwen-Image-Edit is a 20B MMDiT image-to-image model offering precise bilingual (Chinese & English) text edits while preserving style. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/qwen-image/edit-lora

Qwen-Image-Edit LoRA (20B) enables bilingual Chinese/English image-to-image editing with style preservation and semantic and appearance edits. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image/edit-plus

Qwen-Image-Edit-Plus (2509) is a 20B MMDiT image editor with multi-image editing, single-image consistency and native ControlNet support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/qwen-image/edit-plus-lora

Qwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

alibaba/qwen-image/translate

Qwen Vision Translate offers OCR-based image understanding and multilingual in-image text translation for context-aware results. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

alibaba/qwen3-tts-flash

Qwen3 TTS Flash: Low-latency Text-to-Speech for English and Chinese with multiple voices, ideal for real-time dialogue. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

wavespeed-ai/jib-mix-qwen-image/text-to-image

Jib Mix Qwen is a next-gen Text-to-Image model optimized for producing natural, pretty faces with improved Asian facial rendering. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/jib-mix-qwen-image/text-to-image-lora

Jib Mix Qwen LoRA specializes in producing more natural, attractive faces and is particularly strong at rendering Asian facial features for next-gen text-to-image generation with LoRA support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

wavespeed-ai/z-image/turbo

Z-Image-Turbo is a 6 billion parameter text-to-image model that generates photorealistic images in sub-second time. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/z-image/turbo-lora

Z-Image-Turbo LoRA (6B) enables ultra-fast text-to-image generation with external LoRA support. Generate photorealistic images in sub-second latency while applying up to 3 LoRAs for custom styles. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/z-image/turbo-inpaint

Z-Image Turbo Inpaint delivers ultra-fast image inpainting with natural-language instructions—seamlessly fill, fix, or replace regions in your images with production-quality results. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image/layered

Qwen-Image Layered is a unified image-layer decomposition model for prompt-guided compositing. Provide points, boxes, or rough masks to isolate subjects and regions, and the model splits a single image into multiple RGBA layers with clean alpha, soft edges, and correct occlusion order. Ready-to-use REST inference API with fast response, no cold starts, and affordable pricing.

image-to-image

wavespeed-ai/qwen-image/edit-2511

Qwen Image Edit 2511 is a major upgrade over 2509 for real-world image editing and design. It delivers stronger edit consistency, robust multi-person identity/pose consistency, built-in LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

lora-support

wavespeed-ai/qwen-image/edit-2511-lora

Qwen Image Edit 2511 LoRA is an enhanced version with custom LoRA support for personalized styles. It delivers stronger edit consistency, robust multi-person identity/pose consistency, custom LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

text-to-image

wavespeed-ai/qwen-image/text-to-image-2512

Qwen Image 2512 is Qwen's latest text-to-image model with enhanced prompt understanding, superior text rendering, and versatile aspect ratio support. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

lora-support

wavespeed-ai/qwen-image/text-to-image-2512-lora

Qwen-Image-2512 LoRA is an enhanced 20B MMDiT text-to-image model with LoRA support for fast customization and refined image generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-image

wavespeed-ai/z-image-turbo/image-to-image

Z-Image-Turbo Image-to-Image is a 6 billion parameter model that enhances the quality of reference images (similar to upscaling) in sub-second time. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

lora-support

wavespeed-ai/z-image-turbo/image-to-image-lora

Z-Image-Turbo Image-to-Image LoRA transforms reference images with custom LoRA styles in sub-second time. Apply up to 3 LoRAs for personalized image transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

training

wavespeed-ai/qwen-image-2512-lora-trainer

Qwen-Image-2512 LoRA Trainer lets you train custom LoRA models 10x faster with style, character, and object training. From concept to model in minutes, not hours—upload a ZIP file containing images to start. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

wavespeed-ai/z-image-turbo/controlnet

Z-Image-Turbo ControlNet generates images guided by structural control signals (depth, canny edge, pose) for precise composition control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image/edit-multiple-angles

Generate specific camera angles from a single image using a 96-pose camera system. Control horizontal rotation, vertical tilt, and zoom to create front, side, back views and more. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

wavespeed-ai/qwen3-tts/text-to-speech

Qwen3 TTS: Multi-language, multi-voice text-to-speech synthesis with style control. Supports 11 languages and 9 voice characters. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

audio-to-audio

wavespeed-ai/qwen3-tts/voice-clone

Qwen3 TTS Voice Clone: Clone any voice from a reference audio and generate speech in that voice. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-audio

wavespeed-ai/qwen3-tts/voice-design

Qwen3 TTS Voice Design: Generate speech with custom voice characteristics described in natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

wavespeed-ai/z-image/base

Z-Image-Base is a 6 billion-parameter text-to-image model with full CFG support. Supports negative prompting and fine-tuning capabilities for maximum control over image generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/z-image/base-lora

Z-Image-Base LoRA (6B) enables high-quality text-to-image generation with full CFG support and external LoRA support. Supports negative prompting while applying up to 3 LoRAs for custom styles. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

training

wavespeed-ai/z-image/base-lora-trainer

Z-Image Base LoRA Trainer – train custom image LoRA models from your own dataset, with zip uploads, auto-tuned defaults and fast iteration for brand, character or IP looks. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

wavespeed-ai/qwen-image-max/text-to-image

Qwen Image Max is a text-to-image model with high-quality image generation supporting Chinese and English prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image-max/edit

Qwen Image Max Edit is an AI model for image editing with text prompts, supporting both Chinese and English languages. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image/edit-2509-multiple-angles

Qwen Image Edit 2509 Multiple Angles is an AI image editing model that generates multiple-angle views of objects or scenes from a single image. Transform perspectives and create diverse viewpoints with text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Qwen AI Models

Qwen multimodal models developed by Alibaba Cloud offer advanced capabilities in image and video generation. These models excel at creating high-quality visual content from text descriptions with a strong understanding of both Chinese and English prompts.

LoRA-ready Image Editing & Generation

  1. qwen-image/edit-plus-lora

Advanced image editing model with LoRA support, enabling precise style transfer, character customization, and high-fidelity local edits driven by text prompts.

  1. qwen-image/edit-lora

Lightweight edit model for LoRA-based style and character control, ideal for quick retouching, outfit changes, and consistent persona updates.

  1. qwen-image/text-to-image-lora

LoRA-enabled text-to-image generation that supports custom styles and characters while keeping strong prompt adherence and clean composition.

  1. jib-mix-qwen-image/text-to-image-lora

Mixed-style LoRA T2I model tuned for vivid anime and illustration aesthetics, combining sharp linework with rich color and expressive characters.

  1. qwen-image-lora-trainer

Training endpoint for building your own Qwen Image LoRA adapters from reference images, enabling personalized styles and characters across all LoRA-capable Qwen models.

Base Image Editing

  1. qwen-image/edit-plus

Enhanced image editing model for high-quality global and local edits, improving lighting, realism, and detail while preserving subject identity.

  1. qwen-image/edit

General-purpose edit model for everyday photo and artwork adjustments—ideal for quick fixes, background tweaks, and light retouching.

  1. qwen-image/edit-2511

High-consistency image editing model for reliable multi-subject, identity-preserving edits, delivering reduced drift, stronger geometric control, and cleaner, product-grade results for iterative, production workflows.

  1. qwen-image/edit-2511-edit-lora

LoRA-enhanced editing model built on the 2511 backbone—enables style injection, character customization, and fine-tuned aesthetic control while preserving the core stability of production-grade edits.

  1. qwen-image-max/edit

Advanced image editing model offering precise object manipulation, seamless background replacement, and intelligent style transfer, while preserving high-fidelity details and natural lighting.

Base Text-to-Image Generation

  1. qwen-image/text-to-image

Core T2I model that generates clean, realistic images from text prompts, suitable for product shots, portraits, and general creative use.

  1. jib-mix-qwen-image/text-to-image

Stylized T2I variant blending anime and illustration styles, producing vibrant, character-focused art with strong visual appeal.

  1. qwen-image/text-to-image-2512

Next-generation text-to-image model with enhanced prompt adherence, refined detail rendering, and improved compositional accuracy—engineered for photorealistic outputs and complex multi-element scene generation.

  1. qwen-image-max/text-to-image

Premium text-to-image model delivering exceptional detail, superior photorealism, and complex scene coherence. Designed for professional-grade generation with advanced lighting, texture rendering, and precise compositional control.

Utilities & Audio

  1. qwen-image/translate

Image translation utility that reads charts, UI screenshots, and text-heavy graphics, then outputs translated content while preserving layout semantics.

  1. qwen3-tts family

Fast text-to-speech model for natural-sounding voice previews, optimized for low latency in assistants, demos, and real-time applications.