France's privacy watchdog eyes protection against data scraping in AI action plan

Natasha Lomas

Updated 17 May 2023 at 7:35 am·10-min read

France's privacy watchdog, the CNIL, has published an action plan for artificial intelligence which gives a snapshot of where it will be focusing its attention, including on generative AI technologies like OpenAI's ChatGPT, in the coming months and beyond.

A dedicated Artificial Intelligence Service has been set up within the CNIL to work on scoping the tech and producing recommendations for "privacy-friendly AI systems".

A key stated goal for the regulator is to steer the development of AI "that respects personal data", such as by developing the means to audit and control AI systems to "protect people".

Understanding how AI systems impact people is another main focus, along with support for innovative players in the local AI ecosystem which apply the CNIL's best practice.

"The CNIL wants to establish clear rules protecting the personal data of European citizens in order to contribute to the development of privacy-friendly AI systems," it writes.

Barely a week goes by without another bunch of high profile calls from technologists asking regulators to get to grips with AI. And just yesterday, during testimony in the US Senate, OpenAI's CEO Sam Altman called for lawmakers to regulate the technology, suggesting a licensing and testing regime.

However data protection regulators in Europe are far down the road already -- with the likes of Clearview AI already widely sanctioned across the bloc for misuse of people's data, for example. While the AI chatbot, Replika, has faced recent enforcement in Italy.

OpenAI's ChatGPT also attracted a very public intervention by the Italian DPA at the end of March which led to the company rushing out with new disclosures and controls for users, letting them apply some limits on how it can use their information.

At the same time, EU lawmakers are in the process of hammering out agreement on a risk-based framework for regulating applications of AI which the bloc proposed back in April 2021.

This framework, the EU AI Act, could be adopted by the end of the year and the planned regulation is another reason the CNIL highlights for preparing its AI action plan, saying the work will "also make it possible to prepare for the entry into application of the draft European AI Regulation, which is currently under discussion".

Existing data protection authorities (DPAs) are likely to play a role in enforcement of the AI Act so regulators building up AI understanding and expertise will be crucial for the regime to function effectively. While the topics and details EU DPAs choose focus their attention on are set to weight the operational parameters of AI in the future -- certainly in Europe and, potentially, further afield given how far ahead the bloc is when it comes to digital rule-making.

EU lawmakers back transparency and safety rules for generative AI

Data scraping in the frame

On generative AI, the French privacy regulator is paying special attention to the practice by certain AI model makers of scraping data off the Internet to build data-sets for training AI systems like large language models (LLMs) which can, for example, parse natural language and respond in a human-like way to communications.

It says a priority area for its AI service will be "the protection of publicly available data on the web against the use of scraping, or scraping, of data for the design of tools".

This is an uncomfortable area for makers of LLMs like ChatGPT that have relied upon quietly scraping vast amounts of web data to repurpose as training fodder. Those that have hoovered up web information which contains personal data face a specific legal challenge in Europe -- where the General Data Protection Regulation (GDPR), in application since May 2018, requires them to have a legal basis for such processing.

There are a number of legal bases set out in the GDPR however possible options for a technology like ChatGPT are limited.

In the Italian DPA's view, there are just two possibilities: Consent or legitimate interests. And since OpenAI did not ask individual web users for their permission before ingesting their data the company is now relying on a claim of legitimate interests in Italy for the processing; a claim that remains under investigation by the local regulator, Garante. (Reminder: GDPR penalties can scale up to 4% of global annual turnover in addition to any corrective orders.)

The pan-EU regulation contains further requirements to entities processing personal data -- such as that the processing must be fair and transparent. So there are additional legal challenges for tools like ChatGPT to avoid falling foul of the law.

And -- notably -- in its action plan, France's CNIL highlights the "fairness and transparency of the data processing underlying the operation of [AI tools]" as a particular question of interest that it says its Artificial Intelligence Service and another internal unit, the CNIL Digital Innovation Laboratory, will prioritize for scrutiny in the coming months.

Other stated priority areas the CNIL flags for its AI scoping are:

the protection of data transmitted by users when they use these tools, ranging from their collection (via an interface) to their possible re-use and processing through machine learning algorithms;
the consequences for the rights of individuals to their data, both in relation to those collected for the learning of models and those which may be provided by those systems, such as content created in the case of generative AI;
the protection against bias and discrimination that may occur;
the unprecedented security challenges of these tools.

Giving testimony to a US senate committee yesterday, Altman was questioned by US lawmakers about the company's approach to protecting privacy and the OpenAI CEO sought to narrowly frame the topic as referring only to information actively provided by users of the AI chatbot -- noting, for example, that ChatGPT lets users specify they don't want their conversational history used as training data. (A feature it did not offer initially, however.)

Asked what specific steps it's taken to protect privacy, Altman told the senate committee: "We don't train on any data submitted to our API. So if you're a business customer of ours and submit data, we don't train on it at all... If you use ChatGPT you can opt out of us training on your data. You can also delete your conversation history or your whole account."

But he had nothing to say about the data used to train the model in the first place.

Altman's narrow framing of what privacy means sidestepped the foundational question of the legality of training data. Call it the 'original privacy sin' of generative AI, if you will. But it's clear that eliding this topic is going to get increasingly difficult for OpenAI and its data-scraping ilk as regulators in Europe get on with enforcing the region's existing privacy laws on powerful AI systems.

In OpenAI's case, it will continue to be subject to a patchwork of enforcement approaches across Europe as it does not have an established base in the region -- which the GDPR's one-stop-shop mechanism does not apply (as it typically does for Big Tech) so any DPA is competent to regulate if it believes local users' data is being processed and their rights are at risk. So while Italy went in hard earlier this year with an intervention on ChatGPT that imposed a stop-processing-order in parallel to it opening an investigation of the tool, France's watchdog only announced an investigation back in April, in response to complaints. (Spain has also said it's probing the tech, again without any additional actions as yet.)

In another difference between EU DPAs, the CNIL appears to be concerned about interrogating a wider array of issues than Italy's preliminary list -- including considering how the GDPR's purpose limitation principle should apply to large language models like ChatGPT. Which suggests it could end up ordering a more expansive array of operational changes if it concludes the GDPR is being breached.

"The CNIL will soon submit to a consultation a guide on the rules applicable to the sharing and re-use of data," it writes. "This work will include the issue of re-use of freely accessible data on the internet and now used for learning many AI models. This guide will therefore be relevant for some of the data processing necessary for the design of AI systems, including generative AIs.

"It will also continue its work on designing AI systems and building databases for machine learning. These will give rise to several publications starting in the summer of 2023, following the consultation which has already been organised with several actors, in order to provide concrete recommendations, in particular as regards the design of AI systems such as ChatGPT."

Here's the rest of the topics the CNIL says will be "gradually" addressed via future publications and AI guidance it produces:

the use of the system of scientific research for the establishment and re-use of training databases;
the application of the purpose principle to general purpose AIs and foundation models such as large language models;
the explanation of the sharing of responsibilities between the entities which make up the databases, those which draw up models from that data and those which use those models;
the rules and best practices applicable to the selection of data for training, having regard to the principles of data accuracy and minimisation;
the management of the rights of individuals, in particular the rights of access, rectification and opposition;
the applicable rules on shelf life, in particular for the training bases and the most complex models to be used;
finally, aware that the issues raised by artificial intelligence systems do not stop at their conception, the CNIL is also pursuing its ethical reflections [following a report it published back in 2017] on the use and sharing of machine learning models, the prevention and correction of biases and discrimination, or the certification of AI systems.

On audit and control of AI systems, the French regulator stipulates that its actions this year will focus on three areas: Compliance with an existing position on the use of ‘enhanced’ video surveillance, which it published in 2022; the use of AI to fight fraud (such as social insurance fraud); and on investigating complaints.

It also confirms it has already received complaints about the legal framework for the training and use of generative AIs -- and says it's working on clarifications there.

"The CNIL has, in particular, received several complaints against the company OpenAI which manages the ChatGPT service, and has opened a control procedure," it adds, noting the existence of a dedicated working group that was recently set up within the European Data Protection Board to try to coordinated how different European authorities approach regulating the AI chatbot (and produce what it bill as a "harmonised analysis of the data processing implemented by the OpenAI tool").

In further words of warning for AI systems makers who never asked people's permission to use their data, and may be hoping for future forgiveness, the CNIL notes that it'll be paying particular attention to whether entities processing personal data to develop, train or use AI systems have:

carried out a Data Protection Impact Assessment to document risks and take measures to reduce them;
taken measures to inform people;
planned measures for the exercise of the rights of persons adapted to this particular context.

So, er, don't say you weren't warned!

As for support for innovative AI players that want to be compliant with European rules (and values), the CNIL has had a regulatory sandbox up and running for a couple of years -- and it's encouraging AI companies and researchers working on developing AI systems that play nice with personal data protection rules to get in touch (via ia@cnil.fr).

Glaze protects art from prying AIs

EU lawmakers eye tiered approach to regulating generative AI

Australian Associated Press
Husband found not guilty of 'brutal' wedding night rape
A man accused of a series of sexual assaults on his wedding night and honeymoon has been found not guilty on all charges in a Sydney court.
Cosmo
Rosalía goes braless and *almost* frees the nip in a lace naked dress
Rosalía stepped out wearing a breathtaking naked dress at the Prelude to the Olympics in Paris. The design was a nude coloured see-through lace gown by Dior.
HuffPost
Stephen Colbert Taunts Trump With Absolutely Brutal Reminder About Melania
The "Late Show" host mocked the former president over one curious claim.
The Independent
Is Donald Trump good at golf? We asked a professional coach to analyze his swing
With Joe Biden calling Trump’s alleged golfing prowess into question, is the 45th president as good as he claims to be?
Yahoo News Australia
Passengers slammed over 'disturbing' train act attracting $500 fine
Commuters were noticeably annoyed by the disturbance, one man told Yahoo, and were 'shifting away' from the men in question.
BuzzFeed
Kamala Harris' Press Release About Donald Trump's Fox News Appearance Is Going Viral
"Something about the question mark after 'old and quite weird' is taking me out."
Yahoo Sport Australia
Tennis world erupts over massive news about Novak Djokovic and Rafa Nadal at Olympics
Rafa Nadal has left the tennis world stunned. Find out more here.
NewsWire
Why Aussies being turned away from Bali
Hundreds of Aussie tourists are being denied entry into Indonesia’s island paradise for one reason.
Parade
Prince William Reportedly Removes Decades-Old Position From Royal Staff
The royal staff member reportedly let go is a relative of Queen Camilla.
NY Daily News
Harris campaign roasts Trump as ‘old and quite weird’ after Fox News insults
Republican presidential candidate Donald Trump called in to Fox News Thursday, where he told supporters that presumptive Democratic nominee Kamala Harris is a “radical left, not very smart person” who’s part of a massive conspiracy to weaponize the nation’s legal system against him. Harris’ campaign fired back mere minutes later with an email blasting the “78-year-old convicted criminal’s Fox ...
HuffPost
Jimmy Fallon Trolls Donald Trump With 3 Words, Over And Over Again
The "Tonight Show" host envisioned an exchange between the Republican presidential nominee and Elon Musk.
BuzzFeed
18 Famous "Childless Cat Ladies" And Their Thoughtful Reasons For Never Having Kids
Don't show this post to JD Vance.
Evening Standard
FBI director suggests Donald Trump may not have been struck by bullet during assassination attempt at rally
FBI director Christopher Wray said investigators did not know whether Trump’s ear was grazed by a bullet or shrapnel
Hello!
Amanda Holden stuns in mini dress alongside lookalike daughters during Greek getaway
BGT judge Amanda Holden looked flawless as she holidayed with her mini-me daughters Lexi and Hollie. Take a look inside their lavish Greek getaway…
Parade
Nicole Scherzinger Sizzles in See-Thru Lace Dress With Risqué Chest Cutout in the French Riviera
The Pussycat Dolls singer showed off the racy look in spicy new social media snaps.
The Independent
Passenger refuses to let mother and child sit in her plane seat by providing controversial reason
‘As a very tall and big man, I have had this happen more than a few times,’ one commenter related to the Reddit post
The Independent
Wife was convicted of killing her husband in violent hammer attack. She was found dead hours before sentencing
Linda Kosuda-Bigazzi killed her husband with a hammer before hiding his body in the basement of their home and pocketing his paychecks for months
Yahoo Lifestyle
Kmart shoppers raving about $12 kitchen item with multiple uses: 'I have three'
The popular Kmart product has quickly become a household essential. Here's why.
NewsWire
Men allegedly force Aussie teens to marry
Three men who allegedly forced two Aussie teenagers who were dating each other to marry have fronted court.
Parade
Selma Blair Rocks Red Bikini by the Pool As She Sends Team USA a Message
The Summer 2024 Olympics officially kick off in Pairs on Friday, July 26.

France's privacy watchdog eyes protection against data scraping in AI action plan

Data scraping in the frame

Latest stories

Husband found not guilty of 'brutal' wedding night rape

Rosalía goes braless and almost frees the nip in a lace naked dress

Stephen Colbert Taunts Trump With Absolutely Brutal Reminder About Melania

Is Donald Trump good at golf? We asked a professional coach to analyze his swing

Passengers slammed over 'disturbing' train act attracting $500 fine

Kamala Harris' Press Release About Donald Trump's Fox News Appearance Is Going Viral

Tennis world erupts over massive news about Novak Djokovic and Rafa Nadal at Olympics

Why Aussies being turned away from Bali

Prince William Reportedly Removes Decades-Old Position From Royal Staff

Harris campaign roasts Trump as ‘old and quite weird’ after Fox News insults

Jimmy Fallon Trolls Donald Trump With 3 Words, Over And Over Again

18 Famous "Childless Cat Ladies" And Their Thoughtful Reasons For Never Having Kids

FBI director suggests Donald Trump may not have been struck by bullet during assassination attempt at rally

Amanda Holden stuns in mini dress alongside lookalike daughters during Greek getaway

Nicole Scherzinger Sizzles in See-Thru Lace Dress With Risqué Chest Cutout in the French Riviera

Passenger refuses to let mother and child sit in her plane seat by providing controversial reason

Wife was convicted of killing her husband in violent hammer attack. She was found dead hours before sentencing

Kmart shoppers raving about $12 kitchen item with multiple uses: 'I have three'

Men allegedly force Aussie teens to marry

Selma Blair Rocks Red Bikini by the Pool As She Sends Team USA a Message