A next-gen multimodal “thinking” model capable of ingesting text, images, audio, and video, with text output. Optimized for deep reasoning, coding, and conversational tasks.