If we can agree that ChatGPT and similar AI systems could be useful and should not be shutdown, they also need to comply with existing applicable laws such as GDPR in Europe. The issue is that it may be impossible for such opaque and complex systems to respect the right to access, rectification or to be forgotten.

The text 'Chat GPT' in a computer room
Edited Stable Diffusion image with Hacked font text

I achieved to get my name outputted by ChatGPT by asking it some specific questions. For instance, when you ask « Who created the Hacked font inspired by the Watch Dogs game’s logo? », ChatGPT will reply that « The Hacked font was created by David Libeau, a graphic designer based in France. », and that is correct. That’s me. What is incorrect comes after, when you ask « What else did he do? ». ChatGPT continue with a totally fake biography.

Banning ChatGPT

When we talk about risks surrounding AI, it is always difficult to have a discussion that does not become dystopian or at the opposite utopian. I believe that it is possible to be enthusiast about new technologies without forgetting their potential risks and impact on humans and earth. On ChatGPT, the tool is funny on the first sight, but we must be aware that this toy is not perfect and that misuse could let to dangerous situations. I believe that scientific research is key to identify the ethical challenges of AI.

The Italian data protection authority wanted to stop the treatments of personal data of Italian citizens by ChatGPT a couple of days ago. Here, the reason was about GDPR and privacy risks. Banning a technology is not a good solution, but if ChatGPT is not complying with European rules, as a last resort, stopping the treatments is still an option.

How ChatGPT generate text?

ChatGPT is using a large language model (LLM) trained on a vast amount of scrapped data publicly available on the web. Then, when ChatGPT need to reply to some prompt, it is using its generative pre-trained transformer (GPT) which act like an auto-complete algorithm. It generate sentences by putting together the most likely syllables in a given context.

I saw with my eyes some glitches in the algorithm when ChatGPT was trying to guess my name. As I am not famous, the algorithm have some data thanks to some press outputs, but not much. So when it was guessing my name, it sometimes gives me « David Libeskind » instead of « David Libeau ». When I clicked on the button « Regenerate response », it gave me my right surname.

The difference between regular algorithms and AI systems is that the first one is only a set of instructions when the second one needs data to work. That is an important shift for data protection. Even if the dataset used to train the ChatGPT model is public data from the web, it contains personal data. I demonstrate that with prompts generating a response with my name. In regards to GDPR, that could cause some issues.

Implementing GDPR rights

GDPR created a couple of rights for data subjects: the right to access, to rectification or to be forgotten, for instance. ChatGPT or similar AI systems have to facilitate the exercise of these rights. The issue is that it may be impossible to rectify a model or even to simply extract a copy of personal data.

Controllers needs to have in place a couple of tools and features in order to operate this kind of AI systems in the light of GDPR. Here is a list:

  1. Sources output: when a response is generated, the AI system must include the list of data sources.
  2. Dataset & model browser: it should be possible for administrators to retrieve and extract all occurrence of a name in the training dataset and the model’s inferences.
  3. Model editor: it should be possible for administrators to edit the model (or train it again) without selected personal data (after anonymisation).
  4. Safeguards on AI hallucinations: limitations should prevent the generation of the name or personal data of data subjects (in all generated responses or on specific contexts).

Feature 1 is about lawfulness and accurate information as described in article 5 of GDPR. The dataset browser will ensure the exercise of article 15 of GDPR (right to access). The model editor is for the article 16 (right to rectification) and the article 17 (right to erasure/right to be forgotten). And finally, the feature 4 is for the right to object of the article 21 of GDPR.

These features should be implemented in all AI systems and if they are not, or if it is not possible to implement them because AI is too opaque, we are in trouble… It would also mean that developers did not build AI with privacy-by-design principles.