AI Models
Explore available models and their capabilities
Qwen
Released October 2025
Qwen3 VL 32B
Vision-language, OCR, document analysis, video
Tool Use
Vision
About this model
Qwen3 VL 32B is a multimodal vision-language model with 32 billion parameters designed for understanding across text, images, and video. The model features fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding with OCR capabilities across 32 languages and enhanced multimodal fusion through specialized architectures.
Context Window
262K
Input Cost
$0.50/M
Output Cost
$1.50/M
Input Types
Text, Image