Artificial Intelligence (AI)

In onze huidige informatiemaatschappij speelt kunstmatige intelligentie (Artificial Intelligence, hierna: AI) een belangrijke rol. AI is een verzamelnaam voor machines, software en apparaten die zelfstandig complexe taken kunnen uitvoeren, zonder dat daar menselijk denkvermogen aan te pas hoeft te komen. Applicaties bootsen de kennis van het menselijk brein na om zelfstandig beslissingen te nemen. Deze zelflerende systemen worden in tal van sectoren gebruikt, van het bedrijfsleven en overheden tot gezondheidszorg en de rechterlijke macht.

Protect your online content from web scraping by AI providers

Worldwide, lawsuits are currently pending against providers and developers of (mostly general purpose) AI tools that have trained their systems with large amounts of data subject to third-party copyright or database rights.

Lesley Broos
27 September 2024

Blog

Ongoing proceedings

For example, in December 2023, The New York Times initiated a proceeding against OpenAI and its partner Microsoft for allegedly using millions of news articles from The New York Times to train its AI system without permission. Similar lawsuits are also pending over Google / Alphabet's competing AI tools (Bard, Imagen, MusicLM, Duet AI & Gemini). proceeding.

How do you prevent your online content from being used - against your will - for third-party AI training purposes? And is that allowed just like that?

Applicable law

The new European AI Act emphasizes that the AI Regulation is without prejudice to the enforcement of copyright rules under Union law (recital 108). On this basis, one might think that copyrighted works or databases - even if published online - are therefore also protected against reproduction by AI developers who "scrape" content from the Internet, as long as you, the right holder, have not given permission ("licensed") to copy those works or databases as training materials for AI tools. However, this is a misconception; in 2019, in European regulations on copyright and related rights in the digital single market made an important exception to this old intellectual property (IP) law principle, namely that (in short) text and data mining of protected material for commercial purposes is permitted unless the rights holder has appropriately made an express reservation to that effect. Machine-readable means (for example, by including lines understandable to scraping tools in a robots.txt file) are considered "appropriate" in this regard. If you do not make such a reservation or do not do so appropriately, you run the risk of not being able to successfully take action against third parties who legitimately access your online content and make reproductions of your content for text and data mining purposes.

Web scraping of (also) personal data?

By the way, when training AI systems, not only the IP perspective is relevant; privacy law restrictions should also be taken into account. If the online content in question also contains personal data, web scraping is often problematic from that perspective as well. It is not for nothing that the Autoriteit Persoonsgegevens wrote earlier this year that scraping (read: of personal data) 'almost always illegal' is.

Comments

MORE REACTIONS

Digital sustainability under pressure: AI and data use require conscious choices

News/press release

AI fast becoming biggest energy consumer

News press release

Blog

CBS: only 23 percent of Dutch use generative AI

News press release

Het Centraal Bureau voor de Statistiek meldt in een nieuw onderzoek dat een groot gedeelte van de Nederlandse bevolking geen gebruik maakt van generatieve AI (kunstmatige intelligentie)....

Belgian police and Ghent University aim to use Artificial Intelligence to predict crime

News press release

Informatiebeveiliging Nederland

European set of rules on AI enters into force

News press release

AP warns of incidents involving AI

News

AP: heady rise of AI calls for vigilance from all

News press release

Municipalities lead the way in public deployment of AI

News press release

VNG

Menu

FILTER BY CONTENT

Protect your online content from web scraping by AI providers

Lesley Broos
27 September 2024

Blog

Ongoing proceedings

Applicable law

Web scraping of (also) personal data?

Comments

Leave a comment Cancel response

NEWS

Digital sustainability under pressure: AI and data use require conscious choices

CBS: only 23 percent of Dutch use generative AI

European set of rules on AI enters into force

Introduction to AI for public professionals

Privacy in perspective

Expert membership

The New Privacy Officer: Strategic Leadership in the Age of AI and NIS2

IT contracts: in-depth analysis, AI, and recent EU legislation

Join PONT | Data & Privacy

General

Service

Navigate

Berghauser Pont Media Group

CONTACT

Menu

FILTER BY CONTENT

Protect your online content from web scraping by AI providers

Lesley Broos 27 September 2024

Blog

Ongoing proceedings

Applicable law

Web scraping of (also) personal data?

Comments

Leave a comment Cancel response

NEWS

Digital sustainability under pressure: AI and data use require conscious choices

CBS: only 23 percent of Dutch use generative AI

European set of rules on AI enters into force

Introduction to AI for public professionals

Privacy in perspective

Expert membership

The New Privacy Officer: Strategic Leadership in the Age of AI and NIS2

IT contracts: in-depth analysis, AI, and recent EU legislation

Sign up for the newsletter

Join PONT | Data & Privacy

General

Service

Navigate

Berghauser Pont Media Group

CONTACT

Lesley Broos
27 September 2024