A multimodal instruction‑tuned model from Llama 4, powered by a 17B‑parameter mixture‑of‑experts (16 active experts; 109B total) and built for text+image input.