Google Launches Gemini 3.5 Flash Low: New AI Model Reduces Token Usage

Google has introduced the Gemini 3.5 Flash Low model for its Antigravity AI coding platform. This new variant aims to resolve token consumption issues by using 45 percent fewer tokens while maintaining high performance for software engineering tasks, following user complaints about rate limits.

5 Flash Low model on Tuesday, May 26. This strategic move is specifically aimed at users of its AI coding platform, Antigravity, while the introduction of this new variant comes as a direct response to growing concerns regarding token consumption and rate limits that have affected the developer community recently.

Addressing Token Consumption and Rate Limits

5 Flash Low model is to provide a more efficient experience for developers. According to the company, this new variant is designed to consume Notably fewer tokens during operations. This development is crucial because it helps users avoid the frustration of hitting their rate limits prematurely, a problem that became more prominent after Google transitioned its usage system.

Recently, Google shifted from a message-based system to a compute-based usage system. Following this change, many users reported that even simple and routine tasks were exhausting their token quotas at an alarming rate. 5 Flash Low model has been engineered to solve this specific issue, ensuring that developers can maintain their workflow without constant interruptions.

Insights from Google DeepMind

Varun Mohan, who manages the Antigravity platform at Google DeepMind, shared insights regarding the launch on the social media platform X. He acknowledged that the company received numerous complaints from users who felt that basic tasks were consuming an excessive amount of tokens. 5 Flash Low variant.

Google has stated that this new model utilizes approximately 45 percent fewer tokens in its output compared to previous versions. Despite this reduction in resource consumption, the company emphasizes that there has been no compromise on performance. In fact, Google claims that the model delivers superior results in software engineering tasks, making it a highly efficient tool for professional developers.

Restructuring the Flash Model Lineup

With the arrival of the Low variant, Google has now categorized the Gemini 3.5 Flash model into three distinct versions to cater to different user needs:

  • Gemini 3.5 Flash Low: Optimized for simple, everyday tasks and long-term project sustainability.
  • Gemini 3.5 Flash Medium: The original model, now repositioned for standard workloads.
  • Gemini 3.5 Flash High: Designed for complex, heavy-duty tasks that require maximum processing power.

This restructuring allows users to choose the most appropriate model for their specific project requirements, ensuring that they don't waste compute resources on minor tasks.

Benefits for Free Users and Quota Resets

In addition to the model launch, Google has taken steps to support its entire user base by resetting the Gemini quota for all plans this week. This reset includes free users, allowing everyone to start fresh and explore the capabilities of the new AI tools without immediate restrictions. The company aims to ensure that users can complete their AI projects without unnecessary hurdles.

Concerns Over Image Generation Limits

While the token issue is being addressed, some users have raised concerns regarding image generation limits on the platform. A specific comparison was made by a user who noted that while 1000 images can be generated on Codex, the Antigravity Ultra plan only allows for 24 image generations. Varun Mohan admitted that this limit is indeed quite low and expressed that it needs to be increased. However, no official announcement regarding a new limit for image generation has been made yet.