In a never-ending saga between Google and France’s competition authority over copyright protections for news snippets, the Autorité de la Concurrence announced a €250 million high quality against the tech giant Wednesday (around $270 million at today’s exchange rate).
In accordance with the competition watchdog, Google disregarded a few of its previous commitments with news publishers. But the choice is very notable since it drops something else that’s bang up-to-date — by latching onto Google’s use of stories publishers’ content to coach its generative AI model Bard/Gemini.
The competition authority has found fault with Google for failing to notify news publishers of this GenAI use of their copyrighted content. That is in light of earlier commitments Google made that are geared toward ensuring it undertakes fair payment talks with publishers over reuse of their content.
Copyright and competition wrongs
In 2019, the European Union passed a pan-EU digital copyright reform that prolonged copyright protections to news headlines and snippets. News aggregators, corresponding to Google News, Discover and the “Top Stories” feature box on search results pages, had previously scraped and displayed these news stories on their products with none financial compensation.
Google originally sought to evade the law by switching off Google News in France. However the competition authority quickly stepped in — finding its unilateral motion an abuse of a dominant market position that risked harm to publishers. The intervention essentially forced Google to chop deals with local publishers over content reuse. But in 2021, Google was hit with a $592 million high quality after the competition authority found major breaches in its negotiations with local publishers and agencies.
The tech giant called the sanction “disproportionate” and said it will appeal. However it subsequently sought to settle the dispute — offering a series of pledges and withdrawing its appeal. The commitments were accepted by the French Autorité, include passing key information to publishers and negotiating in a good way.
Google has signed copyright agreements with a whole bunch of publishers in France — which fall under the remit of its agreement with the Autorité. So its business on this area could be very tightly regulated.
No appeal
Google has agreed to not contest the Autorité’s latest findings — in exchange for a fast-tracked process and making a monetary payment.
Nevertheless, its managing director for news and publishing partnerships, Sulina Connal, struck a peeved tone — writing in a lengthy blog post that “the high quality just isn’t proportionate to issues raised” by the authority.
The blog post suggests Google really desires to draw a line under the saga this time, with Connal also writing: “We’ve settled since it’s time to maneuver on and, as our many agreements with publishers show, we would like to give attention to the larger goal of sustainable approaches to connecting individuals with quality content and on working constructively with French publishers.”
With generative AI within the frame, and the competitive scramble to launch tools, Google’s calculus on approaching the content reuse issue looks different.
GenAI training within the frame
Today’s enforcement by France’s competition authority shows it honed in on Google’s use of content from news publishers and agencies for training purposes for its AI foundation model and its related AI chatbot service Bard (now called Gemini).
It found Google used content from publishers and press agencies for training Bard, its generative AI tool which launched in July 2023, “without notifying the copyright holders or the Authority,” per its press release.
On this point, Google’s defense is twofold. In its blog post it writes that the competition authority “doesn’t challenge the way in which web content is used to enhance newer products like generative AI, which is already addressed in Article 4 of the EUCD” [EU Copyright Directive].
Article 4 of the Copyright Directive sets out an “exception or limitation for text and data mining” — specifically for “reproductions and extractions of lawfully accessible works and other subject material for the needs of text and data mining”.
Nevertheless in its press release the Autorité argues it has not yet been determined whether the exemption applies here. (It’s price noting the relevant clause refers to “lawfully accessible works” — while Google is under a legally binding commitment to the competition authority to notify copyright holders about uses of their protected works and apparently didn’t achieve this on this case.)
“On the subject of declaring whether using news content to coach a synthetic intelligence service falls under neighboring rights and protection, this query has not been answered just yet,” the competition authority wrote. “Nevertheless, the Autorité considers that Google has breached its commitment #1 by failing to tell publishers that their content had been used to coach Bard.”
Google’s blog post also makes passing mention of the EU AI Act — suggesting it’s of relevance. Nevertheless the laws just isn’t yet in force because it’s pending final adoption by the European Council.
The incoming AI laws may even say developers must abide by the bloc’s copyright rules. And it introduces transparency requirements with that goal in mind — requiring them to place in place a policy to respect EU copyright law; and make publicly available a “sufficiently detailed summary” of the content used for training general purpose AI models (corresponding to Gemini/Bard).
This incoming requirement on model makers to publish a training data summary may, in the longer term, make it easier for news publishers whose protected content has been ingested for GenAI training to acquire fair remuneration under EU copyright law.
No technical opt out
The Autorité also points out that Google failed to offer, until not less than September 28, 2023, a technical solution to permit publishers and press agencies to opt out of their content getting used to coach Bard without such a choice affecting the display of their content on other Google services.
“Until this date, publishers and news agencies that desired to opt out of this use case needed to insert an instruction that blocks all content indexation from Google, including for Search, Discover and Google News services. Those services are specifically a part of the negotiation for revenue related to neighboring rights,” it wrote, adding: “In the longer term, the Autorité will rigorously take a look at the effectiveness of Google’s opt-out processes.”
In additional technical terms, between July and September 2023, news publishers could insert a “noindex” tag to the robots.txt file to be certain that that their content wasn’t used to coach Google’s AI model. This robots.txt file is placed at the foundation folder of web servers and accommodates various instructions for engines like google. Google’s web crawler looks on the instructions in those files to index web sites.
But a “noindex” tag implies that your website disappears from Google altogether. In September 2023, Google added more granularity and created a “Google-Prolonged” rule that’s different from the “noindex” rule. By opting out of the Google-Prolonged instruction, web publishers indicate that they don’t wish to help improve Gemini’s current and future models.
Other shortcomings
The Autorité can also be sanctioning Google for a raft of other issues related to the way it negotiates with French news publishers, finding it failed to offer them with all the knowledge needed to make sure fair bargaining of remuneration for his or her content.
In its press release, it wrote that Google’s information to publishers about its methodology for calculating how much they needs to be paid was “particularly opaque.”
It also found Google failed to fulfill non-discrimination criteria, geared toward ensuring publishers get equal treatment. And it called out a choice by Google to impose a “minimum threshold” for remuneration — i.e. below which it will not make any pay-outs to publishers — with the Autorité describing this as introducing discrimination between publishers “in its very principle”. Below a certain threshold all publishers are “arbitrarily allocated zero remuneration, no matter their respective situation”, its press release also noted.
Moreover, the Autorité found fault with Google’s calculations regarding so-called “indirect income”, saying the “package” it proposed was not in accordance with previous decisions or the appeal judgment of the Court of Justice, from October 2020.
It also said Google didn’t act on its commitment to update remuneration contracts consistent with its pledges.