AVG | Data subjects' rights

Organisaties krijgen steeds vaker verzoeken van betrokkenen die een beroep doen op hun privacyrechten. Hier als organisatie gehoor aan geven is een belangrijk onderdeel van een verantwoord privacybeleid.

ChatGPT's midlife crisis: trying to forget his past

Are training data from ChatGPT compatible with the right to data erasure under Article 17 AVG? Jorie Corten, paralegal at Watsonlaw and master's student in International Technology Law, dives into this issue. The right to data erasure probably also applies to ChatGPT's training data, she concludes. In practice, though, this is complicated: "While it is technically possible to erase a data subject's personal data from ChatGPT, this has little impact on the patterns the model has already learned."

Jorie Corten
16 August 2023

Articles

Artificial intelligence has revolutionized the way we interact with technology, and one of the most recent and impressive examples of this is ChatGPT.[1] This model can generate human-like text, making it useful for a wide range of applications, from customer service chatbots to content creation.

However, due to its wide applicability and rapid emergence, ChatGPT has created many legal uncertainties. This research focuses on the General Data Protection Regulation (GDPR)[2].

ChatGPT

ChatGPT is a Large Generative Artificial Intelligence Model (LGAIM), a technology developed using a large amount of data.[3] ChatGPT's training was conducted in two phases: i) training data, and ii) human input data.[4]

Training data
ChatGPT was trained using a dataset of more than 45 terabytes of text from the Internet, including books, papers, web pages and other text-based content.[5] Output may be biased and inaccurate due to probabilistic responses and imperfect quality of the training data (texts from the Internet).[6]

Human input data
Having been trained with text from the Internet, ChatGPT has been refined with human input. For example, prompts such as "write a research paper on ChatGPT's compatibility with the right to data erasure" or users providing feedback by clicking a "thumbs up" button help ChatGPT train itself.[7]

The datasets used to train ChatGPT are typically sources of a wide range of texts available on the Internet that contain personal data.[8] I asked ChatGPT whether it excluded these personal data from its dataset in its training process.

ChatGPT replied that "(...) ChatGPT itself did not exclude personal data from its dataset during the training, (...)." This article therefore assumes that ChatGPT's training data includes personal data within the meaning of Article 4(1) AVG.

The General Data Protection Regulation

The AVG provides a legal framework for data protection and applies "to the processing of personal data..."[9] Data protection rights, such as the right to data erasure, apply only when personal data are processed.[10

The right to erasure
The right to data erasure is an essential online right of data subjects.[11] A data subject is a natural person who is identified or identifiable through personal data.[12] According to Article 17(1) of the AVG, data subjects have "the right to obtain from the controller erasure of personal data relating to them (...)". This right to data erasure arose in the landmark Google Spain judgment of the Court of Justice of the European Union (CJEU).[13]

The CJEU ruled that a "fair balance" must be sought between data subjects' rights to data protection and the legitimate interests of search engines.[14] The CJEU continued that these data subjects' rights "should in principle take precedence not only over the economic interest of the search engine operator, but also over the interest of this public in accessing this information when searching on that person's name."[15

But when the interference is justified by the general public's interest in accessing the information in question, the data subject's rights may not prevail.[16] The AVG codified Google Spain and elaborated on the right to data erasure.[17]

ChatGPT is an LGAIM and not an Internet search engine. However, ChatGPT is commonly used as a search engine and is trained on data provided by Internet search engines to the general public. Therefore, I would argue that the same rules apply.

Deleting personal data on ChatGPT

Personal data of the trained model of ChatGPT can be deleted in two different ways: (i) retraining the dataset, or (ii) machine unlearning.

Retraining the dataset
Based on a modified training dataset, the ChatGPT model can be retrained.[18] A significant drawback is that (re)training the dataset is very intensive, making it expensive and time-consuming.[19]

Machine unlearning
Another option is to modify the model itself after it has been trained ("machine unlearning").[20] However, this is very complicated and almost never feasible with existing systems.[21] Techniques for machine unlearning are only now being presented and have not been sufficiently researched.[22]

When personal data of one data subject is erased from the training data, it usually has little impact on the patterns already learned by the model.[23] When larger numbers of personal data are erased from the training data, it can have more impact on the patterns of the model. This makes the right to data erasure particularly interesting in a collective action.[24]

Overarching problems

The first dilemma is that based on Google Spain, the general public's interest in having access to the information in question may override the right to data erasure.[25]

The Article 29 Working Party, an independent European advisory body on data protection and privacy, stated that "Internet users have an interest in receiving information through search engines."[26] In this context, the fundamental right to freedom of expression, as defined in Article 11 of the European Charter, should be considered.[27]

A second dilemma arises because ChatGPT's server is located in the United States (U.S.) but the model is trained on data from around the world.[28] The CJEU ruled in its landmark Google v. CNIL case that search engines must erase personal data only in the EU.[29] However, the CJEU continued that EU law does not prohibit the erasure of personal data from all servers.[30]

This raises the key questions of this article: (i) Is the interest of the general public with respect to ChatGPT's training data less than the right of data subjects to data erasure, so that the CJEU's reasoning in Google Spain for ChatGPT can be overruled?

And then, (ii) should this data be deleted from ChatGPT's server in the US, from the model accessible in Europe, or only from the model accessible in the applicant's country?

Conclusion

This article has shown that ChatGPT's training data is likely to include personal data within the meaning of Article 4(1) AVG and therefore the right to data erasure applies.

While it is technically possible to erase a data subject's personal data from ChatGPT, this has little impact on the patterns already learned by the model. Therefore, it is unclear whether ChatGPT's training data is fully compatible with the right to data erasure under Article 17 AVG.

Further research is needed to clarify this issue. In the meantime, one possible solution to this issue is for those involved to initiate a class action.

Sources
1. Hacker, P., Engel, A., & Mauer, M. (2023). Regulating chatgpt and other large generative ai models. arXiv preprint arXiv:2302.02337, p. 2-3. (Hereinafter: Hacker et al (2023)); Hacker, P. (2023). Understanding and regulating ChatGPT, and other large generative AI models. Verfassungsblog: On Matters Constitutional, p 2. (Hereinafter: Hacker (2023)).

2. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016, on the protection of natural persons with regard to the processing of personal data and on the free movement of such data and repealing Directive 95/46/EC (General Data Protection Regulation).

3. Hacker et al (2023), pp. 2-3; Sarel, R. (2023). Restraining ChatGPT, p. 8-9. (Hereinafter: Sarel (2023)).

4. Europol. (2023). ChatGPT: The impact of Large Language Models on Law Enforcement. Accessed April 7, 2023, from https://www.europol.europa.eu/cms/sites/default/files/documents/Tech%20Watch%20Flash%20-%20The%20Impact%20of%20Large%20Language%20Models%20on%20Law%20Enforcement.pdf, p. 3-4. (Hereinafter: Europol (2023));

5. Europol (2023), pp. 3-4; Sarel (2023), p. 9.

6. Sarel (2023), pp. 8-9.

7. Hacker (2023), p 2.

8. Protection of personal data and privacy. Council of Europe. Accessed April 8, 2023, from https://www.coe.int/en/web/portal/personal-data-protection-and-privacy.

9. Purtova, N. (2018). The law of everything. Broad concept of personal data and future of EU data protection law. Law, Innovation and Technology, 10(1). Accessed March 29, 2023, from https://doi.org/10.1080/17579961.2018.1452176, p. 43-44.

10. Ibid.

11. Tzanou, M. (2020). The unexpected consequences of the EU Right to Be Forgotten: Internet search engines as fundamental rights adjudicators. In Personal Data Protection and Legal Developments in the European Union (pp. 279-301). IGI Global, pp. 1-2 of the electronic copy. (Hereinafter: Tzanou (2020)).

12. Article 4(1) AVG.

13. CJEU [GC] 13 May 2014, Google Spain, C-131/12, ECLI:EU:C:2014:317, para. 99 and judgment. (Hereinafter Google Spain)

14. Google Spain para. 81.

15. Google Spain para. 99 and the ruling.

16. Ibid.

17. Tzanou (2020), pp. 1-2.

18. Veale, M., Binns, R., & Edwards, L. (2018). Algorithms that remember: model inversion attacks and data protection law. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2133), 20180083, p. 9. (Hereaftera: Veale et al (2018)).

19. Ibid.

20. Veale et al (2018), p. 9.

21. Ibid.

22. Cao, Y., & Yang, J. (2015, May). Towards making systems forget with machine unlearning. In 2015 IEEE symposium on security and privacy (pp. 463-480). IEEE.

23. Veale et al (2018), p. 10.

24. See, e.g., Ausloos, J. Toh, J., & Giannopoulou, A. (Nov. 23, 2022). The case for collective action against the harms of data-driven technologies. Ada Lovelace Institute. Accessed April 7, 2023, from https://www.adalovelaceinstitute.org/blog/collective-action-harms/.

25. Google Spain, para. 99.

26. Article 29 Working Group. (2014). Guidelines on the implementation of the Court of Justice of the European Union judgment on "Google Spain and Inc. v. Agencia Española de Protección de Datos (AEPD) and Mario Costeja González" C-131/12. Accessed April 8, 2023, from https://ec.europa.eu/newsroom/article29/tems/667236/en, p. 6.

27. Ibid.

28. OpenAI. (2023). GPT-4. Accessed April 7, 2023, from https://openai.com/research/gpt-4.

29. CJEU [GC] 24 September 2019, Google v CNIL, C-131/12, ECLI:EU:C:2019:772, para.63. (Hereinafter Google v CNIL).

30. Google v CNIL, para. 72.

Comments

MORE REACTIONS

A "friend" in your pocket?

Blog

Tax authorities will not discontinue controversial Customer Supervision Model until 2028

News/press release

Cabinet supports Digital Omnibus, but puts the brakes on postponing AI Act

News/press release

AI and Copyright: Why a ruling from Munich changes everything

Blog

On November 11, 2025, the Munich Regional Court ruled in a case that has the legal world and the tech sector on edge: GEMA v. OpenAI (case number 42 O 14139/24). The...

AP to monitor data security in healthcare

News/press release

Autoriteit Persoonsgegevens

A strategic approach to managing AI risks

Blog

From Data to Intelligence: circle of insight and action

Blog

EU launches AI Act Service Desk: central support point for responsible AI use

News/press release

Privacy as a blind spot: why a registry against domestic violence is not without risks

Interview

Editorial Board PONT

Current affairs AVG and Privacy law

Using case law, opinions and decisions of the AP and opinions and guidelines of the EDPB, get a complete and clear picture of how further interpretation is given to the mostly open standards in the AVG and UAVG.

Learn more

AI in practice: the legal aspects

In this course you will receive insight into Artificial Intelligence from a specialized lawyer (Martin Hemmer, attorney at AKD) as well as a data scientist (Nienke Bakx, data science consultant at Datacation). Partly based on practical examples, the lecturers will discuss the legal backgrounds and pitfalls. You will work with a practical case to apply the knowledge immediately.

Learn more

Privacy in the workplace in-depth course.

Ensuring privacy in the workplace has become a complex challenge. Employers have increasing opportunities to collect and use employee personal data. Often to increase efficiency but in some cases decisions can be made that have major implications for data subjects. In this course, based in part on current case law, the do's and dont's regarding privacy in the workplace will be discussed. Can you fire an employee in the event of a data breach? And what do you do if an employee has recorded conversations in connection with a labor dispute? When does the employer's interest outweigh an employee's privacy rights? How do you deal with AI in job application procedures?

Learn more

New EU Legislation: Privacy & Data Essentials in 1 Day

Many new laws and regulations around data, cyber, technology, AI and digital platforms are coming our way from Brussels. Consider, for example, the impact of the Data Act, the AI Act, the CRA and NIS2. Important new legislation: what's the latest? In this course we address the question of how you as a professional can prepare yourself. You will get an overview of the most important laws and regulations and we will discuss the latest developments.

Learn more

AVG in practice for clerks

Learn what influences the AVG will have on your work as a registrar. How do you handle personal data? And what are the obligations in the AVG for the city council?

Learn more

Menu

FILTER BY CONTENT

ChatGPT's midlife crisis: trying to forget his past

Jorie Corten
16 August 2023

Articles

ChatGPT

The General Data Protection Regulation

Deleting personal data on ChatGPT

Overarching problems

Conclusion

Comments

Leave a comment Cancel response

NEWS

A "friend" in your pocket?

AI and Copyright: Why a ruling from Munich changes everything

A strategic approach to managing AI risks

KENNISPARTNER

Jitty van Doodewaerd

Current affairs AVG and Privacy law

AI in practice: the legal aspects

Privacy in the workplace in-depth course.

New EU Legislation: Privacy & Data Essentials in 1 Day

AVG in practice for clerks

Join PONT | Data & Privacy

General

Service

Navigate

Berghauser Pont Media Group

CONTACT

Menu

FILTER BY CONTENT

ChatGPT's midlife crisis: trying to forget his past

Jorie Corten 16 August 2023

Articles

ChatGPT

The General Data Protection Regulation

Deleting personal data on ChatGPT

Overarching problems

Conclusion

Comments

Leave a comment Cancel response

NEWS

A "friend" in your pocket?

AI and Copyright: Why a ruling from Munich changes everything

A strategic approach to managing AI risks

KENNISPARTNER

Jitty van Doodewaerd

Current affairs AVG and Privacy law

AI in practice: the legal aspects

Privacy in the workplace in-depth course.

New EU Legislation: Privacy & Data Essentials in 1 Day

AVG in practice for clerks

Sign up for the newsletter

Join PONT | Data & Privacy

General

Service

Navigate

Berghauser Pont Media Group

CONTACT

Jorie Corten
16 August 2023