Gemini 3 Launches Agentic Vision in Response to DeepSeek-OCR2
Google DeepMind's Agentic Vision in Gemini 3 Flash revolutionizes image understanding with a Think-Act-Observe loop and Python code execution for zooming, annotating, and visual analysis.
“AI Disruption” Publication 8600 Subscriptions 20% Discount Offer Link.
You didn’t expect this, did you? Google DeepMind has just rolled out a heavyweight new capability for Gemini 3 Flash: Agentic Vision.
Could it be that they were provoked by DeepSeek-OCR2?
As you can see, this technology has completely transformed the way large language models understand the world:
From the past method of “guessing” to today’s “in-depth investigation.”
This capability was launched by the Google DeepMind team. Core product manager Rohan Doshi stated that traditional AI models, when processing images, usually just take a static glance.
If the details in the image are too small—like a serial number on a microchip or a blurry road sign in the distance—the model often has no choice but to “guess.”
But Agentic Vision introduces a “Think-Act-Observe” closed loop:
The model is no longer passively receiving pixels; instead, it actively writes Python code to manipulate the image based on the user’s needs.
This capability has directly enabled Gemini 3 Flash to achieve a 5% to 10% performance leap across various vision benchmarks.




