Menu

Filter by
content
PONT Data&Privacy

0

Nominated for the Dutch Privacy Awards: TNO in collaboration with NFI & SURF

On January 28, the annual Dutch Privacy Awards will be presented during the National Privacy Conference organized by ECP and Privacy First. These awards provide a platform for organizations that view privacy as an opportunity to distinguish themselves positively and that demonstrate how to do business and innovate in a privacy-friendly manner.

PONT Editorial Team | Data & Privacy December 16, 2025

Interview

Interview

What exactly does your product/service/project do, and what privacy issue does it solve?

Literally “Privacy First”

The GPT-NL model is the first Large Language Model (LLM) to take a radically different approach to privacy and data protection, making it the first LLM to demonstrably comply with the GDPR. Big Tech claims that training an LLM is not possible without scraping content from the internet (without the knowledge or consent of those involved or rights holders) and leaves personal data in the training content (at most removing limited categories of personal data in a static manner), only to later place safeguards (filters) for unwanted content about individuals in the output of their LLMs. As a result, personal data can leak from the model (for example, via extraction attacksmodel inversion attacks, or membership inference attacks) and output often contains incorrect information about individuals. These LLMs do not comply in any way with the privacy guidelines for training LLMs issued by the EDPB and the French supervisory authority (CNIL).

One might think that LLMs trained on publicly available data on the internet reduce privacy issues. However, publicly accessible does not necessarily mean that personal data also belongs to the public domain. Data shared on social media is also usually shared in a specific context. The GPT-NL model is the first model to address this issue by training the model on data intended for the public domain. This is also the recommendation of scientists, see for example "What Does it Mean for a Language Model to Preserve Privacy?"

The GPT-NL model is also the first model to place the principles of data minimization and privacy-by-design at the heart of the source. The innovation here lies not only in the anonymization technologies used (which are substantial), but also in the overall design of the GPT-NL ecosystem, which focuses on privacy protection.

In short: no untargeted scraping of the Internet, complete transparency of all data sources, and a radically new form of contextual anonymization before LLM training begins.

What prompted you to develop this?

The GPT-NL project was prompted by growing unease about our dependence on big tech and foreign LLMs. The Dutch government has awarded a grant to TNO to train a sovereign Dutch-language large language model in the national interest that fully complies with European and Dutch rules and public values. Dutch is a relatively small language area and therefore not a priority for foreign providers. Because the GPT-NL model is trained on original, high-quality Dutch-language texts, the subtleties of the Dutch language can be better preserved.

What impact has your submission made so far?

Although the first working version of the GPT-NL model has not yet been completed (the first release will be ready for a select group of launching customers in February 2026), the GPT-NL project has already made a major impact. The Netherlands is the first country to successfully enter into a partnership with joint publishers (such as NDP Nieuwsmedia, see here) for the use of their joint content to train GPT-NL. In particular, the agreement with the publishers' organization NDP has received a lot of attention both at home and abroad, because it makes GPT-NL the first LLM party worldwide to collaborate with publishers on a large scale and allow them to share in the benefits realized through GPT-NL.

This media attention has subsequently led to even more parties wanting to contribute content to the GPT-NL project. The news about the nomination for the Dutch Privacy Awards also played a role in this! After this news, we immediately saw a new group of parties expressing interest in contributing their content to the GPT-NL project. Several potential customers have also indicated that this nomination further underscores that GPT-NL is a good choice when looking for a compliant LLM.

What privacy challenges have you encountered, and what has your team learned from them?

  1. Non-public persons. The basic principle is that for the GPT-NL model to function properly, it does not need to be able to answer questions about non-public persons. During the project, it quickly became apparent that the EDPB guidelines for AI models were not at all useful. These provide standard anonymization solutions, whereby certain data is removed or randomized, but these do not work for anonymizing content for LLM training, because the context and sentence structure are lost as a result, preventing the LLM from "learning" properly. Instead of leaving the personal data in the training content (as with other LLMs), TNO implemented contextual anonymization, which preserves sentence structures and allows the LLM to learn.  This means that material protection of data subjects has been chosen instead of granting procedural rights, such as offering their users an opt-out for the use of user content, as Big Tech does for the use of user content for training their LLMs.
  2. Public figures: for the GPT-NL model to function properly, it must be able to answer questions about public figures. If all information about public figures were to be filtered out, this would mean that the GPT-NL model would no longer be able to answer questions such as "Who is the king of the Netherlands?" This information may also include special categories of data, such as political preference, health, and religion. For example, that the Pope is Catholic or that Mark Rutte is a member of the VVD. TNO applies a very strict definition of "public figure.". This only includes people who have their own Wikipedia page. In order to also guarantee the privacy of public figures, all data that is not relevant to their public role is removed, including all sensitive and/or irrelevant personal data, such as contact details or bank account numbers (if these were already present). 

Here we encountered the guidelines issued by the EDPB and other privacy regulators such as CNIL and the AP, which require that all special personal data be removed from the training content. Through publications (see the IAPP articles here, here, here, and here) and panel discussions with regulators at IAPP Brussels 2025, we were able to change the regulators' minds, and they have since amended their guidelines accordingly. See specifically the CNIL guidelines, in which it states:"For example: a large language model is fed with a large number of public sources and correctly informs users about the functions of certain public figures. A public figure asks to be erased from the model. Retraining the model has a very high cost. The request for erasure can in principle be rejected."

Here too, this is a fundamentally different approach than existing LLMs, which limit themselves to filtering their model outputs relating to non-public figures, using a very broad definition of "public figure." As a result, LLM outputs contain personal data from a large group of people, including individuals who do not play an extensive role in public life.

What are your plans for the future, and how do you intend to further increase your impact?

GPT-NL believes in the possibility of a fair value chain for data, the most important raw material for LLMs, in which copyright holders are compensated for the contribution of their data. A licensing model has therefore been chosen in which part of the license revenue flows back to content providers, while the remainder is invested in maintaining the GPT-NL model. In addition, a governance model has been set up (see the Governance Charter of the GPT-NL project here ). This provides for future-proof governance, ensuring that the further development of Dutch AI models within the GPT-NL project is guaranteed on a permanent basis.

Further plans for the future to increase the impact of GPT-NL are:

  • Expansion of GPT-NL to other European languages, as well as languages spoken by large minority groups in Europe, in order to achieve broader applicability.
  • Addition of multiple modalities, including speech, so that the model can also be used in a variety of text-to-speech applications , such as automatic note-taking in sensitive or business-critical contexts.
  • Further optimization of the model and integration into public and business applications.
  • Active contribution to European AI sovereignty through collaboration with media partners and governments.

Share article

Comments

Leave a comment

You must be logged in to post a comment.