On January 28, the annual Dutch Privacy Awards will be presented during the National Privacy Conference organized by ECP and Privacy First. These awards provide a platform for organizations that view privacy as an opportunity to distinguish themselves positively and that demonstrate how to do business and innovate in a privacy-friendly manner.

Literally “Privacy First”
The GPT-NL model is the first Large Language Model (LLM) to take a radically different approach to privacy and data protection, making it the first LLM to demonstrably comply with the GDPR. Big Tech claims that training an LLM is not possible without scraping content from the internet (without the knowledge or consent of those involved or rights holders) and leaves personal data in the training content (at most removing limited categories of personal data in a static manner), only to later place safeguards (filters) for unwanted content about individuals in the output of their LLMs. As a result, personal data can leak from the model (for example, via extraction attacks, model inversion attacks, or membership inference attacks) and output often contains incorrect information about individuals. These LLMs do not comply in any way with the privacy guidelines for training LLMs issued by the EDPB and the French supervisory authority (CNIL).
One might think that LLMs trained on publicly available data on the internet reduce privacy issues. However, publicly accessible does not necessarily mean that personal data also belongs to the public domain. Data shared on social media is also usually shared in a specific context. The GPT-NL model is the first model to address this issue by training the model on data intended for the public domain. This is also the recommendation of scientists, see for example "What Does it Mean for a Language Model to Preserve Privacy?"
The GPT-NL model is also the first model to place the principles of data minimization and privacy-by-design at the heart of the source. The innovation here lies not only in the anonymization technologies used (which are substantial), but also in the overall design of the GPT-NL ecosystem, which focuses on privacy protection.
In short: no untargeted scraping of the Internet, complete transparency of all data sources, and a radically new form of contextual anonymization before LLM training begins.
The GPT-NL project was prompted by growing unease about our dependence on big tech and foreign LLMs. The Dutch government has awarded a grant to TNO to train a sovereign Dutch-language large language model in the national interest that fully complies with European and Dutch rules and public values. Dutch is a relatively small language area and therefore not a priority for foreign providers. Because the GPT-NL model is trained on original, high-quality Dutch-language texts, the subtleties of the Dutch language can be better preserved.
Although the first working version of the GPT-NL model has not yet been completed (the first release will be ready for a select group of launching customers in February 2026), the GPT-NL project has already made a major impact. The Netherlands is the first country to successfully enter into a partnership with joint publishers (such as NDP Nieuwsmedia, see here) for the use of their joint content to train GPT-NL. In particular, the agreement with the publishers' organization NDP has received a lot of attention both at home and abroad, because it makes GPT-NL the first LLM party worldwide to collaborate with publishers on a large scale and allow them to share in the benefits realized through GPT-NL.
This media attention has subsequently led to even more parties wanting to contribute content to the GPT-NL project. The news about the nomination for the Dutch Privacy Awards also played a role in this! After this news, we immediately saw a new group of parties expressing interest in contributing their content to the GPT-NL project. Several potential customers have also indicated that this nomination further underscores that GPT-NL is a good choice when looking for a compliant LLM.
Here we encountered the guidelines issued by the EDPB and other privacy regulators such as CNIL and the AP, which require that all special personal data be removed from the training content. Through publications (see the IAPP articles here, here, here, and here) and panel discussions with regulators at IAPP Brussels 2025, we were able to change the regulators' minds, and they have since amended their guidelines accordingly. See specifically the CNIL guidelines, in which it states:"For example: a large language model is fed with a large number of public sources and correctly informs users about the functions of certain public figures. A public figure asks to be erased from the model. Retraining the model has a very high cost. The request for erasure can in principle be rejected."
Here too, this is a fundamentally different approach than existing LLMs, which limit themselves to filtering their model outputs relating to non-public figures, using a very broad definition of "public figure." As a result, LLM outputs contain personal data from a large group of people, including individuals who do not play an extensive role in public life.
GPT-NL believes in the possibility of a fair value chain for data, the most important raw material for LLMs, in which copyright holders are compensated for the contribution of their data. A licensing model has therefore been chosen in which part of the license revenue flows back to content providers, while the remainder is invested in maintaining the GPT-NL model. In addition, a governance model has been set up (see the Governance Charter of the GPT-NL project here ). This provides for future-proof governance, ensuring that the further development of Dutch AI models within the GPT-NL project is guaranteed on a permanent basis.
Further plans for the future to increase the impact of GPT-NL are:
