Updates to the Google Gemini AI base model and new AI Studio and Vertex features expected next month aim to support advanced application workflows more effectively than existing versions.
Updated Google Gemini 1.5 Pro Wide Language Model (LLM) previews in 200 countries across various Google consumer and developer services, should be generally available in June. It will support up to 1 million tokens in its pop-up, according to company officials during keynote presentations on Tuesday.
To his initial introduction in February, Gemini 1.5 Pro practically supported 128,000 tokens, and 1 million tokens was an experimental feature. Token Window refers to the amount of data – text, images, audio or video – that an LLM can reason about at once. Users of Google AI Studio and Vertex AI developer tools can also join a waitlist this week to preview support for up to 2 million tokens planned for later this year. One million tokens equals approximately one hour of video, 11 hours of audio, 30,000 lines of code, and 750,000 words.
“The largest window of tokens is like the working memory of AI, and this is one of the frontiers in terms of AI’s usefulness for advanced, highly contextual tasks,” said David Strauss, co-founder and CTO of WebOps service provider Pantheon. , who used Google AI Summit machine learning platform for production and experimental projects. “This is shifting more and more tasks to ones that AI can accomplish on a whim, rather than with extensive training or even fine-tuning.”
1 million tokens – so what?
Major LLM vendors have engaged in an arms race to expand the multiple attributes of their models over the last year, and particularly in recent months, said Ian Beaver, chief scientist at Verint Systems, a contact center provider as a service. in Melville, New York. He cited examples such as Anthropic. Claude 3 Opus launch two months ago, which outperformed OpenAI’s GPT-4 in GPT-4 LLM benchmarks; in April, Meta showed higher benchmark performance for Llama 3 compared to the pre-release version of Gemini 1.5 Pro Just yesterday, OpenAI announced GPT-4o and an update to ChatGPT supporting text input, audio and image and including higher benchmarks than Llama 3 and Gemini. 1.5 Pro.
All of these models also made big strides in input token limits, Beaver said: GPT-4 went from 16,000 to 128,000 tokens; Claude went from 100,000 to 200,000; and Gemini went from 32,000 to 1 million.
Larger pop-ups can be useful for certain applications, such as video prompts and generation. Still, Beaver said he’s not sure how useful a million tokens are.
Ian CastorChief Scientist, Verint
“The fact that you can now comfortably send the entire text of War and peace can be useful for generating reviews of great novels, but it remains to be seen how effective these models are at maintaining long-distance dependencies in contextual data across such a large search space,” he said . “In our experience, once you get past a few hundred tokens, it is generally not helpful for the quality of the model response to include more, as there is usually a selection pipeline going on before the LLM, such as a database query or a search.
Bigger isn’t necessarily better, Torsten Volk, an analyst at Enterprise Management Associates, wrote in a blog post last month.
“While the impressive million-token pop-up of Google’s Gemini 1.5 Pro provides a theoretical advantage in handling big data, the practical effectiveness of a language model like GPT-4 often surpasses it due to more sophisticated mechanisms… (that) efficiently manage a smaller Windows context by focusing computing resources on the most relevant information, thereby optimizing performance,” Volk wrote in the post.
Google AI Studio, Vertex AI updates
Meanwhile, updates to the Google Gemini API and services like Google AI Studio and Vertex AI added new features specifically for developers this week. The first, pop-up caching, could be more effective than large pop-ups, according to Volk. The feature, touted by Google as a way to make model training and prompts more efficient by not having to return large data sets repeatedly, can also make recurring queries on large sets of documents easier.
“Coincidentally, OpenAI said that GPT-4o now also has context caching in conversations,” Volk said in an online interview this week, referring to OpenAI news the day before Google I/O.
Another Google Gemini developer update unveiled this week concerns parallel function calling, meaning the model can call multiple functions at once.
This will power a emerging trend towards deploying AI agents that execute multi-step workflows; Google’s Vertex AI added a Agent Builder tool, while Atlassian added support for AI agents, or virtual teammates, with its Atlassian Rovo product.
Gemini 1.5 Flash and Gemma add cost flexibility
A new version of Gemini rolling out this week, Gemini 1.5 Flash, uses a technique called distillation to pass the data analysis capabilities of the larger Pro model to a lighter, less expensive LLM, optimized to give faster answers than the larger version.
With Flash, Google added a new pay-per-use price for AI Studio and Vertex AI. Gemini Flash 1.5 is priced at $0.35 per million tokens for 128,000 token prompts and $0.70 per million tokens for larger prompts. For comparison, Gemini 1.5 Pro costs $3.50 per million tokens for up to 128,000 and $7.00 per million tokens for larger prompts. In general, early adopters of hosted LLM services said controlling cloud costs was a priority. one of their biggest challenges so far.
“We haven’t done anything at a large enough scale on Vertex that (cost) becomes a priority, but I will say that a lot of Vertex products seem to have real utility billing,” Strauss said. “I like this because it means we can potentially provide it by default in isolation for customers and only pay for actual usage.”
In a context of rapid growth Open source AI world, two new permutations of Google’s Gemma will significantly increase the size of the open source LLM with the 27 billion parameter Gemma 2 and add a fine-tuned model for video generation with PaliGemma – Google’s first open vision language model.
As with benchmarks and token entry limits, all major model providers have launched cheaper and faster versions of their flagship models, according to Verint’s Beaver.
“What previously required the largest, most expensive model can now be achieved by a smaller, cost-effective model,” he said. “The AI arms race is also rapidly driving down the cost of entry for high-performing LLMs. It’s only getting cheaper to deploy applications using generative AI.
Multimodal support for a wider range of models will also reduce the cost of producing various types of media content, Beaver predicted.
Trust, security and quality remain top concerns for AI
Natively multimodal Gemini models are capable of processing various forms of data, such as images and video as well as text, and producing multi-format output, but do not yet operate this way in production-ready practices.
Google is working on a new version of Imagen “rebuilt from the ground up,” according to a keynote presentation by Douglas Eck, senior research director at Google, following a recent controversy this forced Google to suspend the image generation tool in February. Imagen 3 is now available for trial on Google’s ImageFX AI test kitchen and will soon be available on Vertex AI. Truly multimodal features will be more widely available later this year, key I/O stakeholders said.
Several keynote speakers also highlighted the work Google is doing on AI trust and safety to avoid further controversial outcomes, including updating the red team model, consulting a panel of human experts from multiple fields of study and a watermarking tool called SynthID.
However, the adoption of generative AI tools such as Ansible Light Speed so far, indicate that companies have not yet taken to the use of production with enthusiasm. Strauss said early Vertex projects had mixed results, although he attributed this in part to the fact that the data sets were not properly integrated.
“We used Vertex AI in prototypes to make recommendations for tagging written content and for the Vertex AI search system,” he said. “The former is actively in production, and we’ve seen poor results with the latter, but we need to put more effort into the integration to truly test it.”
Beth Pariseau, senior editor for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Do you have any advice? Send him an email or contact @PariseauTT.