Wednesday, October 8, 2025

Gemini 2.5 Computer Use Model


Earlier this year, Google announced its initiative to provide computer usage functionalities to developers via the Gemini API. 

The company is introducing the Gemini 2.5 Computer Use model, our latest specialized framework founded on the visual comprehension and reasoning capabilities of Gemini 2.5 Pro, which enables agents to engage with user interfaces (UIs). 


This model surpasses leading alternatives across various web and mobile control benchmarks, all while maintaining reduced latency. Developers can leverage these functionalities through the Gemini API available in Google AI Studio and Vertex AI. 


How AI Models Work

While AI models can interact with software through structured APIs, numerous digital tasks still necessitate direct engagement with graphical user interfaces, such as filling out and submitting forms. 


To accomplish these tasks, agents need to navigate web pages and applications in a manner akin to human interaction: through clicking, typing, and scrolling. 


The proficiency to natively complete forms, manipulate interactive components like dropdowns and filters, and operate behind logins represents a significant advancement in the development of robust, versatile agents. 


Operational Overview 

The model’s fundamental capabilities are accessible through the new `computer_use` tool in the Gemini API and should function within a continual loop. 


Inputs to the tool consist of the user request, a screenshot of the environment, and a log of recent actions. 


The input can also indicate whether to omit specific functions from the comprehensive list of supported UI actions or to include supplemental custom functions. 


Diagram of AI agent loop: An initial task yields a screenshot/context, which is dispatched to the Model, which subsequently returns a response to the computer environment to execute an action. 


Flow of the Gemini 2.5 Computer Use Model 

The model then examines these inputs and generates a response, generally a function call corresponding to one of the UI actions, such as clicking or typing. 


This response may also incorporate a request for end user confirmation, which is necessary for certain actions, like processing a purchase. The client-side code then implements the received action. 


Upon executing the action, a new screenshot of the GUI and the current URL are relayed back to the Computer Use model as a function response, thereby resetting the loop. 


This iterative process persists until the task reaches completion, an error arises, or the interaction is halted by a safety response or user decision. 


The Gemini 2.5 Computer Use model is predominantly optimized for web browsers but also exhibits strong potential for mobile UI control tasks. However, it has not yet been optimized for desktop operating system-level control. 


Performance Overview 

The Gemini 2.5 Computer Utilization model exhibits exceptional performance across a variety of web and mobile control benchmarks. 


The table below presents results based on self-reported data, assessments conducted by Browserbase, and evaluations we executed independently. 


Detailed evaluation information can be found in the Gemini 2.5 Computer Use evaluation documentation and Browserbase's blog article. Unless stated otherwise, the scores displayed pertain to computer usage tools available through API. 


Benchmark performance overview: Gemini 2.5 Computer Use excels in Online-Mind2Web, WebVoyager, and AndroidWorld benchmarks. 



0 comments:

Post a Comment

Follow Us On Facebook

Categories

6G Accessories Acer Action Adidas Agari AI AI Glasses AI Hub AI Supercomputers AI Toys Aiper Air Conditioner Airtags AmazFit Amazon Android Anker Apple Apple Pod Apple Watch Apps Arlo Arslo Asus AT&T Audi Auto Automobiles Automower Awinto Bags BAND V2 Beats Binocular Birdwatching Blueair BowFlex BRE NUC Bronx Camera Cameras Carplay Chairs Cobra Coco Jones Controllers Coway Dbrand Denali Denon Desktop Computer Device Comparisons Divoom Document Scanners Dreo Dyson Earbuds Earn Earphones EcoFlow Electronics Escort Etekcity EyeDropper Fans Fashion Fiverr Food FoodTusker Fosi Audio Gadgets Gallery Games Gaming Gardon Garmin Forerunner Gemini Gizmos Glasses Google Google Chrome Google Gemini GoPro Max 2 GUNNAR Guru Gym Hair Clipper Hammer HDAJY Headphones Health Helmets Hisense HoMedics HOMPOW HD 4K Honor HP Huawei Hydrow Hyper Ice Makers iGarden ILO Insta360 iPhone iPhone 17 iPhone Air Jabra Janam JBL JISULIFE Jobs jorking Kailo Keto Kia Kismile Kitchen Things KU XIU Laptops Latest In R&B Latest In Tech LeafyPod LED Lights LG LHKNL Liffo Litheli Living in NYC Living in USA Lockly Logitech Luggex Luna Mac MAMMOTION Marathon Mario Massager MediaTek Meek Mill Mesqool Meta Midea Money Money Jewelry Motorola Mova Music New York new yorker News Nex Nokia Nothing Nubia Nura PerL Pro Nvidia Oakley OnePlus OpenRock Oppo OtterBox OTTOCAST Oukitel Ovens OXO Pelican Pen Display Pet King Philips phone Phone Comparisons Phones Pimax Pinwheel Pixel Pixel 9 Pro Plustek Pool Cleaner Power Bank Price Slash Printers Projectors Qualcomm Raleigh Ray-Bans Razer Re-Timer 3 realme RedMagic Redmi REOLINK Retimer Review RIG Ring Robot Rocco Rogbid Runmefit Watch 4 Samsung Scooter Scuba Seenda Segway Sennheiser Sensor Sharp Shokz SimpliSafe Smart Glasses Smart Home Smart Kitchen smart Ovens smartphones nyc Smartwatches Snapdragon Software Sonic Racing Sonos Sony Sony Xperia Soundbar Speaker Sports SSD Standing Desk Stanley FatMax Summer Walker surfers2 Surge Protectors Swippitt T-Mobile Tablets Tamar Braxton Tank Taylor Swift TCL techno Technology Tecno Telescope Tesla TheARTI$t Thermostat Timex Tips and Advice Tools top amazon Tovala Toyota TravelMate TravlFi TVs Ulefone Ultrahuman Unihertz UWANT Vacuum Cleaner Valerion VISSPL Vivo Volkswagen VPN WAC Washington Watches Water Filter Wearables Webmasters Wera Technology Whisker Withings WITHit Women Beauty Wyze Xgimi Xiaomi Yutong ZIZO Zulay Zuomeng
When you click on some of the links in this blog, you'll get a notification that you're being redirected to our affiliate marketing page. We urge you to support us as we will earn some commissions if you buy our products.
Your subscription could not be saved. Please try again.
Your subscription has been successful.

Want Some Juice?

Join 14,200+ other professionals.

============= Our Members Earn 6-Figure Incomes. Begin Earning Now

Click Here.

Best Deals on Amazon Audio Books

Read In Another Language

Up to 44% OFF!

Quick Note

As an Amazon Associate I earn from qualifying purchases.

Popular Posts