Google’s attempting to make waves with Gemini, its flagship suite of generative AI models, apps and services.
So what’s Google Gemini, exactly? How are you going to use it? And the way does Gemini stack as much as the competition?
To make it easier to maintain up with the most recent Gemini developments, we’ve put together this handy guide, which we’ll keep updated as latest Gemini models, features and news about Google’s plans for Gemini are released.
What’s Gemini?
Gemini is Google’s long-promised, next-gen generative AI model family, developed by Google’s AI research labs DeepMind and Google Research. It is available in 4 flavors:
- Gemini Ultra, essentially the most performant Gemini model.
- Gemini Pro, a light-weight alternative to Ultra.
- Gemini Flash, a speedier, “distilled” version of Pro.
- Gemini Nano, two small models — Nano-1 and the more capable Nano-2 — meant to run offline on mobile devices.
All Gemini models were trained to be natively multimodal — in other words, in a position to work with and analyze greater than just text. Google says that they were pre-trained and fine-tuned on a wide range of public, proprietary and licensed audio, images and videos, a big set of codebases and text in numerous languages.
This sets Gemini other than models similar to Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything beyond text (e.g., essays, email drafts), but that isn’t necessarily the case with Gemini models.
We’ll note here that the ethics and legality of coaching models on public data, in some cases without the info owners’ knowledge or consent, are murky indeed. Google has an AI indemnification policy to shield certain Google Cloud customers from lawsuits should they face them, but this policy incorporates carve-outs. Proceed with caution, particularly in the event you’re intending on using Gemini commercially.
What’s the difference between the Gemini apps and Gemini models?
Google, proving once more that it lacks a knack for branding, didn’t make it clear from the outset that Gemini is separate and distinct from the Gemini apps on the net and mobile (formerly Bard).
The Gemini apps are clients that connect with various Gemini models — Gemini Ultra (with Gemini Advanced, see below) and Gemini Pro up to now — and layer chatbot-like interfaces on top. Consider them as front ends for Google’s generative AI, analogous to OpenAI’s ChatGPT and Anthropic’s Claude family of apps.
Gemini on the net lives here. On Android, the Gemini app replaces the present Google Assistant app. And on iOS, the Google and Google Search apps function that platform’s Gemini clients.
Gemini apps can accept images in addition to voice commands and text — including files like PDFs and shortly videos, either uploaded or imported from Google Drive — and generate images. As you’d expect, conversations with Gemini apps on mobile carry over to Gemini on the net and vice versa in the event you’re signed in to the identical Google Account in each places.
The Gemini apps aren’t the one technique of recruiting Gemini models’ assistance with tasks. Slowly but surely, Gemini-imbued features are making their way into staple Google apps and services like Gmail and Google Docs.
To make the most of most of those, you’ll need the Google One AI Premium Plan. Technically a component of Google One, the AI Premium Plan costs $20 and provides access to Gemini in Google Workspace apps like Docs, Slides, Sheets and Meet. It also enables what Google calls Gemini Advanced, which brings Gemini Ultra to the Gemini apps plus support for analyzing and answering questions on uploaded files.
Gemini Advanced users get extras here and there, also, like trip planning in Google Search, which creates custom travel itineraries from prompts. Bearing in mind things like flight times (from emails in a user’s Gmail inbox), meal preferences and data about local attractions (from Google Search and Maps data), in addition to the distances between those attractions, Gemini will generate an itinerary that updates routinely to reflect any changes.
In Gmail, Gemini lives in a side panel that may write emails and summarize message threads. You’ll find the identical panel in Docs, where it helps you write and refine your content and brainstorm latest ideas. Gemini in Slides generates slides and custom images. And Gemini in Google Sheets tracks and organizes data, creating tables and formulas.
Gemini’s reach extends to Drive, as well, where it will possibly summarize files and provides quick facts a few project. In Meet, meanwhile, Gemini translates captions into additional languages.
Gemini recently got here to Google’s Chrome browser in the shape of an AI writing tool. You should utilize it to jot down something completely latest or rewrite existing text; Google says it’ll take into consideration the webpage you’re on to make recommendations.
Elsewhere, you’ll find hints of Gemini in Google’s database products, cloud security tools, app development platforms (including Firebase and Project IDX), not to say apps like Google TV (where Gemini generates descriptions for movies and TV shows), Google Photos (where it handles natural language search queries) and the NotebookLM note-taking assistant.
Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is offloading heavy computational lifting to Gemini. So are Google’s security products underpinned by Gemini, like Gemini in Threat Intelligence, which might analyze large portions of doubtless malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.
Gemini Gems custom chatbots
Announced at Google I/O 2024, Gemini Advanced users will have the option to create Gems, custom chatbots powered by Gemini models, in the longer term. Gems could be generated from natural language descriptions — for instance, “You’re my running coach. Give me a day by day running plan” — and shared with others or kept private.
Eventually, Gems will have the option to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep and YouTube Music, to finish various tasks.
Gemini Live in-depth voice chats
A brand new experience called Gemini Live, exclusive to Gemini Advanced subscribers, will arrive soon on the Gemini apps on mobile, letting users have “in-depth” voice chats with Gemini.
With Gemini Live enabled, users will have the option to interrupt Gemini while the chatbot’s chatting with ask clarifying questions, and it’ll adapt to their speech patterns in real time. And Gemini will have the option to see and reply to users’ surroundings, either via photos or video captured by their smartphones’ cameras.
Live can also be designed to function a virtual coach of sorts, helping users rehearse for events, brainstorm ideas and so forth. For example, Live can suggest which skills to spotlight in an upcoming job or internship interview, and it will possibly give public speaking advice.
What can the Gemini models do?
Because Gemini models are multimodal, they will perform a variety of multimodal tasks, from transcribing speech to captioning images and videos in real time. Lots of these capabilities have reached the product stage (as alluded to within the previous section), and Google is promising far more within the not-too-distant future.
In fact, it’s a bit hard to take the corporate at its word.
Google seriously underdelivered with the unique Bard launch. More recently, it ruffled feathers with a video purporting to indicate Gemini’s capabilities that was kind of aspirational, not live, and with a picture generation feature that turned out to be offensively inaccurate.
Also, Google offers no fix for a few of the underlying problems with generative AI tech today, like its encoded biases and tendency to make things up (i.e. hallucinate). Neither do its rivals, but it surely’s something to take into accout when considering using or paying for Gemini.
Assuming for the needs of this text that Google is being truthful with its recent claims, here’s what the several tiers of Gemini can do now and what they’ll have the option to do once they reach their full potential:
What you’ll be able to do with Gemini Ultra
Google says that Gemini Ultra — due to its multimodality — could be used to assist with things like physics homework, solving problems step-by-step on a worksheet and declaring possible mistakes in already filled-in answers.
Ultra can be applied to tasks similar to identifying scientific papers relevant to an issue, Google says. The model could extract information from several papers, as an example, and update a chart from one by generating the formulas obligatory to re-create the chart with more timely data.
Gemini Ultra technically supports image generation. But that capability hasn’t made its way into the productized version of the model yet — perhaps since the mechanism is more complex than how apps similar to ChatGPT generate images. Moderately than feed prompts to a picture generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.
Ultra is offered as an API through Vertex AI, Google’s fully managed AI dev platform, and AI Studio, Google’s web-based tool for app and platform developers. It also powers Google’s Gemini apps, but not free of charge. Once more, access to Ultra through any Gemini app requires subscribing to the AI Premium Plan.
Gemini Pro’s capabilities
Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning and understanding capabilities. The newest version, Gemini 1.5 Pro, exceeds even Ultra’s performance in some areas, Google claims.
Gemini 1.5 Pro is improved in various areas compared with its predecessor, Gemini 1.0 Pro, perhaps most obviously in the quantity of information that it will possibly process. Gemini 1.5 Pro can absorb as much as 1.4 million words, two hours of video or 22 hours of audio, and reason across or answer questions on all that data.
1.5 Pro became generally available on Vertex AI and AI Studio in June alongside a feature called code execution, which goals to cut back bugs in code that the model generates by iteratively refining that code over several steps. (Code execution also supports Gemini Flash.)
Inside Vertex AI, developers can customize Gemini Pro to specific contexts and use cases via a fine-tuning or “grounding” process. For instance, Pro (together with other Gemini models) could be instructed to make use of data from third-party providers like Moody’s, Thomson Reuters, ZoomInfo and MSCI, or source information from corporate data sets or Google Search as an alternative of its wider knowledge bank. Gemini Pro can be connected to external, third-party APIs to perform particular actions, like automating a workflow.
AI Studio offers templates for creating structured chat prompts with Pro. Developers can control the model’s creative range and supply examples to present tone and elegance instructions — and in addition tune Pro’s safety settings.
Vertex AI Agent Builder lets people construct Gemini-powered “agents” inside Vertex AI. For instance, an organization could create an agent that analyzes previous marketing campaigns to grasp a brand style, after which apply that knowledge to assist generate latest ideas consistent with the style.
Gemini Flash is for less demanding work
For less demanding applications, there’s Gemini Flash. The most recent version is 1.5 Flash.
An offshoot of Gemini Pro that’s small and efficient, built for narrow, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, meaning it will possibly analyze audio, video and pictures in addition to text (but only generate text).
Flash is especially well-suited for tasks similar to summarization, chat apps, image and video captioning and data extraction from long documents and tables, Google says. It’ll be generally available via Vertex AI and AI Studio by mid-July.
Devs using Flash and Pro can optionally leverage context caching, which lets them store large amounts of knowledge (say, a knowledge base or database of research papers) in a cache that Gemini models can quickly and comparatively cheaply access. Context caching is an extra fee on top of other Gemini model usage fees, nonetheless.
Gemini Nano can run in your phone
Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) phones as an alternative of sending the duty to a server somewhere. To date, Nano powers a few features on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which lets users push a button to record and transcribe audio, features a Gemini-powered summary of recorded conversations, interviews, presentations and other audio snippets. Users get summaries even in the event that they don’t have a signal or Wi-Fi connection — and in a nod to privacy, no data leaves their phone in the method.
Nano can also be in Gboard, Google’s keyboard substitute. There, it powers a feature called Smart Reply, which helps to suggest the following thing you’ll need to say when having a conversation in a messaging app. The feature initially only works with WhatsApp but will come to more apps over time, Google says.
Within the Google Messages app on supported devices, Nano drives Magic Compose, which might craft messages in styles like “excited,” “formal” and “lyrical.”
Google says that a future version of Android will tap Nano to alert users to potential scams during calls. And shortly, TalkBack, Google’s accessibility service, will employ Nano to create aural descriptions of objects for low-vision and blind users.
Is Gemini higher than OpenAI’s GPT-4?
Google has several times touted Gemini’s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks utilized in large language model research and development.” But leaving aside the query of whether benchmarks really indicate a greater model, the scores Google points to seem like only marginally higher than OpenAI’s GPT-4 models.
OpenAI’s latest flagship model, GPT-4o, pulls ahead of 1.5 Pro pretty substantially on text evaluation, visual understanding and audio translation performance, meanwhile. Anthropic’s Claude 3.5 Sonnet beats them each — but perhaps not for long, given the AI industry’s breakneck pace.
How much do the Gemini models cost?
Gemini 1.0 Pro (the primary version of Gemini Pro), 1.5 Pro and Flash can be found through Google’s Gemini API for constructing apps and services, all with free options. However the free options impose usage limits and omit some features, like context caching.
Otherwise, Gemini models are pay-as-you-go. Here’s the bottom pricing (not including add-ons like context caching) as of June 2024:
- Gemini 1.0 Pro: 50 cents per 1 million input tokens, $1.50 per 1 million output tokens
- Gemini 1.5 Pro: $3.05 per 1 million tokens input (for prompts as much as 128,000 tokens) or $7 per 1 million tokens (for prompts longer than 128,000 tokens); $10.50 per 1 million tokens (for prompts as much as 128,000 tokens) or $21.00 per 1 million tokens (for prompts longer than 128,000)
- Gemini 1.5 Flash: 35 cents per 1 million tokens (for prompts as much as 128K tokens), 70 cents per 1 million tokens (for prompts longer than 128K); $1.05 per 1 million tokens (for prompts as much as 128K tokens), $2.10 per 1 million tokens (for prompts longer than 128K)
Tokens are subdivided bits of raw data, just like the syllables “fan,” “tas” and “tic” within the word “unbelievable”; 1 million tokens is comparable to about 700,000 words. “Input” refers to tokens fed into the model, while “output” refers to tokens that the model generates.
Ultra pricing has yet to be announced, and Nano remains to be in early access.
Is Gemini coming to the iPhone?
It’d! Apple and Google are reportedly in talks to place Gemini to make use of for various features to be included in an upcoming iOS update later this 12 months. Nothing’s definitive, as Apple can also be said to be in talks with OpenAI and has been working on developing its own generative AI capabilities.
Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with additional third-party models including Gemini, but didn’t disclose additional details.
This post was originally published Feb. 16, 2024 and has since been updated to incorporate latest details about Gemini and Google’s plans for it.