Menu

Filter by
content
PONT Data&Privacy

0

The European Commission confirms: scope to train and use AI models with personal data without consent under the GDPR

AI models operate on the basis of data. This applies both to extremely large, often generatively deployable foundation models (such as those used for ChatGPT) and to smaller specialized (sector-specific) models designed for a specific purpose. It is not uncommon for this data to contain information about individuals, which means it falls under the General Data Protection Regulation (GDPR). For example, to train foundation models, huge amounts of data, often personal data, are 'scraped' from the (public) internet. Earlier this year, tech companies Meta and LinkedIn announced that they would use the personal data of users of their social media platforms to develop and train their own (foundation) models. It is striking that this processing does not take place on the basis of user consent. The legal basis invoked is the 'legitimate interest' provision of the GDPR.

18 December 2025

Despite the fact that the authoritative European Data Protection Board (abbreviated in English to the "EDPB") already laid the foundation in 2024 [1]for invoking legitimate interest as a possible legal basis for processing personal data to train AI models, various data protection supervisory authorities in the European Union appeared to be less unanimous on this point. For example,the Autoriteit Persoonsgegevens publicly warned Autoriteit Persoonsgegevens to invoke their right to object to the processing of their data for AI training as proposed by Meta and LinkedIn. And the Hamburg supervisory authority (Hamburgische Beauftragte für Datenschutz und Informationsfreiheit, "HmbBfDI"), after initially labeling this activity as unlawful, ultimately decided not to pursue the enforcement procedure against one of thesebig tech parties. This was partly in anticipation of a clear EU framework on this issue. With the European Commission's recent legislative proposal to amend the GDPR, resulting from theDigital Omnibus Package, a serious step towards this clear EU framework seems to have been taken. 

The Digital Omnibus Package: overview and objectives

On November 19, 2025, the European Commission announced the Digital Omnibus Package, with the aim of optimizing the application of the digital regulatory framework in the EU.[3]The proposal was drafted to make compliance with these regulations more cost-efficient, thereby achieving competitive advantages (including in the field of AI solutions). The starting point is that the level of protection of (EU) citizens and the intended objectives of each of these digital laws are not compromised. The proposal is mainly aimed at EU organizations that are bound by this tangle of EU digital regulations, butbig tech players thatfall under the EU legal regime can also enjoy these benefits. Specifically, to achieve its goal, the European Commission has proposed a bill that introduces a series of amendments to EU digital legislation, including the GDPR.

Clarification of the concept of ‘personal data’ and pseudonymization

Based on the current version of the GDPR, personal data is defined as "any information relating to an identified or identifiable natural person." Whether certain information can lead to the identification of a person depends on the means available to the party concerned or another person. These means must not be unlawful or require disproportionate effort. This interpretation of the concept is also referred to as the "absolute" doctrine. 

In the Digital Omnibus, the European Commission is taking a different approach. In line with recent case law of the Court of Justice of the European Union[4], a context-dependent approach has been chosen: the European Commission clarifies that information does not automatically have to be considered personal data by every party simply because another party is able to identify the person concerned. In the context of AI training, this means that a party that receives datasets containing pseudonymous personal data for this purpose, and does not have additional information about these individuals, isnotsubject to the GDPR because these datasets cannot be considered personal data. The question of whether there is a legal basis for this processing, and to what extent that may be a legitimate interest, is then no longer relevant.

Expansion and clarification of the basis of "legitimate interest" for AI training and use

However, the European Commission goes one step further in the Digital Omnibus: a new article is included in the GDPR which leaves no doubt that the processing of personal data for the training and use of AI models and AI systems can be based on legitimate interest. The controller must, however, comply with all the conditions for invoking this basis.[5]Specifically, this means that a balancing of interests must be carried out to determine whether the controller's interest in training its AI model or using its AI system with personal data outweighs the interests of the data subjects. 

In addition, strict security conditions must be observed, such as applying data minimization when selecting the sources from which data is collected, but also during the training and testing of the AI system or model, and upholding the absolute, unconditional right of the data subject to object to the processing of their personal data in the context of AI.

The fact that the European Commission has also taken the criticism of the Hamburg supervisory authority (partly) into account is evident from its emphasis that the above situationdoes notapply if Union or Member State law stipulates that personal data may only be processed for AI on the basis of consent. In other words, at Member State level, there is scope for the national legislator to restrict the legitimate interest as a basis for the processing of personal data for AI.

It is also worth noting that the European Commission has included a new, albeit limited, exception to the prohibition on processing special categories of personal data (such as data concerning a person's health).[6]Specifically in the context of the development and use of an AI system or AI model, the processing of special categories of personal data is permitted under certain conditions. These conditions require that appropriate technical and organizational measures be taken to prevent the collection of special categories of personal data. If, nevertheless, special categories of personal data are discovered in the datasets used for training, testing, or validating the AI system or model, and their removal would require a disproportionate effort, their processing is permitted. In that case, it is not necessary to obtain consent. The condition is that the special categories of data may not be used to create output and may not be disclosed in any other way. 

Practical implications for organizations that process personal data for training AI models or using AI systems

The Digital Omnibus proposal provides clarity on the processing of personal data in an AI context. Under certain conditions, the training of AI models can be based on the legitimate interest under the GDPR. Obtaining consent, which is not always practical or feasible, can therefore be omitted. The same applies to the 'remaining' special categories of personal data that appear to have crept into the datasets used.

A direct disadvantage of this method is that individuals whose personal data ends up in an AI dataset as a result ofscrapingwill generally not be aware of this. Organizations are required to provide information about this, for example via the privacy statement on their website, but the likelihood that someone will independently or proactively consult this statement is limited. The strong right of objection granted to individuals by the Digital Omnibus proposal to stop this processing therefore exists only on paper.

Finally, the question is whether, and if so, to what extent, the European Commission's legislative proposal will remain intact. The European Parliament and the Council are currently deliberating on its content. Nevertheless, the current proposal shows a significant shift in the European Commission's approach to digital regulation, which will affect all organizations involved in the training and use of AI with personal data.

This article was written fordeLex.

[1] Guidelines 1/2024 on the processing of personal data based on legitimate interest. Although the public consultation on these guidelines, adopted on October 8, 2024, has been closed, the final version has not yet been published by the EDPB. The content may therefore still be subject to change (https://www.edpb.europa.eu/our-work-tools/documents/public-consultations/2024/guidelines-12024-processing-personal-data-based_nl); Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, adopted on December 17, 2024 (https://www.edpb.europa.eu/our-work-tools/our-documents/opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en). 

[2]https://www.autoriteitpersoonsgegevens.nl/actueel/ap-kom-nu-in-actie-als-je-niet-wil-dat-meta-ai-traint-met-jouw-datahttps://www.autoriteitpersoonsgegevens.nl/actueel/ap-bezorgd-over-ai-training-linkedin-en-roept-gebruikers-op-om-instellingen-aan-te-passen.

[3]https://digital-strategy.ec.europa.eu/en/library/digital-omnibus-regulation-proposal.

[4] CJEU September 4, 2025, case C-413/23 P, ECLI:EU:C:2025:645 (EDPS v. SRB).

[5] New Article 88c GDPR.

[6] New Article 9(2)(k) and 9(5) GDPR.

Share article

Comments

Leave a comment

You must be logged in to post a comment.