A nonprofit consortium will work on an open GPT language model for the Netherlands. The goal is to promote the country's digital sovereignty, provide transparent AI and ensure privacy. GPT-NL will be free for academic and public entities and is being developed with ethical values in mind. The first version is expected by the end of 2024.

A consortium consisting of non-profit organizations TNO, SURF and the Netherlands Forensic Institute (NFI) on Thursday announced plans to develop an open language model. 'GPT-NL,' as the three organizations call it, will serve as the Netherlands' own language model. This will allow both academic institutions to use the language model in research activities, as well as government agencies in areas such as health, defense and security.
GPT-NL aims to strengthen the digital sovereignty of the Netherlands through the deployment of transparent, fair and verifiable AI. " This is not only going to contribute content to the activities of Dutch organizations, but even support the decision-making process around these activities in line with Dutch and European values with technology and data. ", explains Selmar Smit, chief scientist at TNO, to PONT | Data & Privacy.
Following the Swedish example (1), the plan began to take shape at the end of last year, when the three partner organizations conducted an investigation into the need and feasibility of an open language model in Dutch society applicable on a large scale. "TNO has years of experience in developing language models, while SURF brings the necessary hardware expertise. NFI also works in-house on its own models and would like to start applying GPT-NL to real case scenarios. So there's a perfect combination there." The NFI indicated that staff will apply the model primarily to forensic activities, such as voice search for voice identification and author recognition for online messages.
The 13.5 million euros in funding for GPT-NL was made available by the Ministry of Economic Affairs and Climate. Smit admits that there is no real competition with the U.S. tech giants, given the significant budget differences. "Nevertheless, we can still offer a similar solution in a reliable way. GPT-NL is indeed going to work well in both Dutch and English."
GPT-NL is launched so that the source code can be viewed by people and used under license conditions for their own purposes. Thus, more collaborative forms of software development can emerge for the benefit of competition and innovation. In contrast, ChatGPT is proprietary software, over which OpenAI has proprietary rights and does not provide free access to its source code. In addition, the U.S. company offers ChatGPT on a "freemium" basis, with costs associated with using the advanced software version in addition to the free basic version. The consortium plans to offer GPT-NL completely free of charge to academics and public organizations. As for the private sector, Smit is still uncertain about the type of commercial licensing model. "I don't rule out that companies will have to pay, but we still have to coordinate that internally, especially with the legal experts."
GPT-NL training will be based on data that has been publicly published. Smit acknowledges that there may be personal data among these. To protect the privacy of citizens, agreements have been made within the consortium to exclude non-anonymized data from the training process. "That is not going to work 100 percent, which is why we are also devising solutions to provide an 'opt-out' option to citizens so that their data definitely does not end up in our datasets."
NFI attaches great importance that when models are used for forensic investigations, the other parties in the criminal process can receive an explanation of how the model works and training. "It is important to minimize the chance of bias - unconscious bias - in the conclusion of NFI investigations."
Smit expects to have the first version of GPT-NL ready by the end of 2024.
(1) https://www.ai.se/en/node/81535/language-models-swedish-authorities.
