Don’t type anything into Gemini, Google’s family of GenAI apps, that’s incriminating — or that you just wouldn’t want another person to see.
That’s the PSA (of sorts) today from Google, which in a brand new support document outlines the ways by which it collects data from users of its Gemini chatbot apps for the online, Android and iOS.
Google notes that human annotators routinely read, label and process conversations with Gemini — albeit conversations “disconnected” from Google Accounts — to enhance the service. (It’s not clear whether these annotators are in-house or outsourced, which might matter with regards to data security; Google doesn’t say.) These conversations are retained for up to a few years, together with “related data” just like the languages and devices the user used and their location.
Now, Google affords users some control over which Gemini-relevant data is retained — and the way.
Switching off Gemini Apps Activity in Google’s My Activity dashboard (it’s enabled by default) prevents future conversations with Gemini from being saved to a Google Account for review (meaning the three-year window won’t apply). Individual prompts and conversations with Gemini, meanwhile, may be deleted from the Gemini Apps Activity screen.
But Google says that even when Gemini Apps Activity is off, Gemini conversations will probably be saved to a Google Account for as much as 72 hours to “maintain the protection and security of Gemini apps and improve Gemini apps.”
“Please don’t enter confidential information in your conversations or any data you wouldn’t desire a reviewer to see or Google to make use of to enhance our products, services, and machine learning technologies,” Google writes.
To be fair, Google’s GenAI data collection and retention policies don’t differ all that much from those of its rivals. OpenAI, for instance, saves all chats with ChatGPT for 30 days no matter whether ChatGPT’s conversation history feature is switched off, excepting in cases where a user’s subscribed to an enterprise-level plan with a custom data retention policy.
But Google’s policy illustrates the challenges inherent in balancing privacy with developing GenAI models that feed on user data to self-improve.
Liberal GenAI data retention policies have landed vendors in how water with regulators within the recent past.
Last summer, the FTC requested detailed information from OpenAI on how the corporate vets data used for training its models, including consumer data — and the way that data’s protected when accessed by third parties. Overseas, Italy’s data privacy regulator, the Italian Data Protection Authority, said that OpenAI lacked a “legal basis” for the mass collection and storage of non-public data to coach its GenAI models.
As GenAI tools proliferate, organizations are growing increasingly wary of the privacy risks.
A recent survey from Cisco found that 63% firms have established limitations on what data may be entered into GenAI tools. while 27% have banned GenAI altogether. The identical survey revealed that 45% of employees have entered “problematic” data into GenAI tools including worker information and non-public files about their employer.
OpenAI, Microsoft, Amazon, Google and others offer GenAI products geared toward enterprises that explicitly don’t retain data for any length of time, whether for model training or some other purpose. Consumers though — as is usually the case — get the short end of the stick.