
How to comply with copyright under the European AI Act when placing a general-purpose AI model on the European market

Specialised Lawyer for Copyright and Media Law View profile
The European AI Act is about to become reality. After the final signatures of the Presidents of the European Parliament and of the Council on May 21, 2024, the Act is expected to be published within the next couple of days and to enter into force 20 days later. The final wording and numbering of the Act has been defined in the corrected version of April 17, 2024.
As the AI Act also includes rules to respect European copyright after the release of large language models like Chat-GPT, image generators like Midjourney or movie generators like Sora, has led to very controversial copyright discussions and a large number of copyright infringement cases in the US as well as some in Europe. This blog aims to shed some light on the obligations regarding copyright also for providers which are not based in the EU.
Who has to comply with EU copyright law?
According to Article 2 (1) a) the AI Act “applies to all providers placing on the market or putting into service AI systems or placing on the market general-purpose AI models in the Union, irrespective of whether those providers are established or located within the Union or in a third country”. This means that the obligations to comply with European copyright law apply not only to AI models developed in Europe, but to all so-called ‘general purpose models’ (GPAI) as defined in Article 3 (63) of the AI Act, which are offered in the EU. Recital 107 further explains that “any provider placing a general-purpose AI model on the Union market should comply with this obligation, regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of those general-purpose AI models take place. This is necessary to ensure a level playing field among providers of general-purpose AI models where no provider should be able to gain a competitive advantage in the Union market by applying lower copyright standards than those provided in the Union.”
General-purpose models only?
The AI Act refers to the copyright obligations only in context of the ‘general purpose models’ in Article 53 of the AI Act. AI systems as defined in Article 3 (1) do not appear to be covered by those obligations. However, although the AI Act only imposes obligations on GPAI models, any AI system must of course comply with EU Copyright law when used in the EU. The distinctive feature of the AI Act is its attempt to apply EU law to the use of copyrighted material outside the EU. This includes reproductions for the training of the GPAI model.
What are the obligations regarding copyrighted material?
Article 53 (1) c) and d) of the AI Act sets out two main obligations regarding copyrighted material:
- put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;
- draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.
The obligation to establish a policy for complying with Union law on copyright and related rights is emphasizing the obligation to identify and comply with “a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790”, which refers to a reservation of rights regarding reproductions for text and data mining (TDM) purposes. Under Article 4(3) of Directive (EU) 2019/790 (DSM Directive), the exception for text and data mining cannot be invoked in case the rightsholder reserves those rights. For works accessible online, the reservation must be in an appropriate manner, such as machine-readable as defined by the law. Various technical standards have been published by rightsholders, standardization organisations and GPAI model providers to provide a ‘machine-readable’ reservation.
Does the text and data mining exception cover any type of AI training?
No. The TDM exception applies only under very specific conditions. It only covers reproductions made for the purposes of text and data mining, which is an automated analytical technique aimed at analysing text and data in digital form to generate information including, but not limited to, patterns, trends and correlations. Furthermore, when text and data mining is intended for commercial use, the reservation of rights must be respected, and the reproductions may only be retained for as long as is necessary for the purposes of text and data mining (see Article 4 (2) and (3) of the DSM Directive).
In case AI training involves any other kind of uses reserved to the rightsholder, like a communication to the public of works (or parts of it) or its distribution, or if the reproductions are made for a purpose which go beyond the purpose of text and data mining, the content can only be used with the prior consent of the rightsholder. The recitals of the AI Act confirm this view. Recital 105 specifically states that “The development and training of such models require access to vast amounts of text, images, videos, and other data. Text and data mining techniques may be used extensively in this context for the retrieval and analysis of such content, which may be protected by copyright and related rights. Any use of copyright protected content requires the authorisation of the rightsholder concerned unless relevant copyright exceptions and limitations apply”.
It will therefore be highly relevant to know exactly what kind of uses are actually being made in connection with a particular AI training, whether rights reservations have been respected, and whether all the conditions of the exemptions invoked are actually met. This is especially relevant as the term ‘AI training’ per se remains undefined by the AI Act and is used in various ways throughout the industry. As a so-called exception-exception Article 5 (5) Directive 2001/29/EC (Info Soc Directive), which is also applicable to the TDM exception (in a reference to Article 7 (2) of the DSM Directive), requires that “the exceptions […] shall be only applied in certain special cases which do not conflict with a normal exploitation of the work or other subject-matter and do not unreasonably prejudice the legitimate interests of the rightsholder”. The conditions of this so-called “Three-Steps-Test” have to be evaluated on a case-by-case basis.
What is a sufficiently detailed summary of the content used to train the general-purpose AI model?
The amount if detail that is considered ‘sufficient’ under Article 53 (1) d) is not defined in the AI Act. Recital 107 states that “While taking into due account the need to protect trade secrets and confidential business information, this summary should be generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used”.
In order to exercise and enforce their rights under Union law, the rightsholders will indeed need the following details:
- identifiability of individual works used in connection with AI training
- an explanation of the processes of use during training
- the circumstances of use (time/place)
- and, insofar as the AI provider relies on exception, a description of the provider’s compliance with the legal conditions of the exception.
Pursuant to Article 53 (1) d), the AI Office will provide a template that can be used to identify the ‘sufficient details’ of the summary that needs to be provided and rightsholders will hope that the template will include the details listed above.
What are the consequences if a model does not comply with the AI Act ?
The fines for infringements to the AI act are set as a percentage of the offending company’s global annual turnover in the previous financial year or a predetermined amount, whichever is higher. For a breach of the obligations for providers of GPAI models Article 101 (1) requires payment of up to 3% of the annual turnover or EUR 15 million (whichever is higher).
Apart from the statutory fines, claims to cease-and-desist as well as claims for damages may arise if copyright infringements can be proven by a rightsholder.
When does this obligation begin?
The transition period is quite tight. Although there is a transition period of 24 months for the AI Act in general, the obligations for providers of GPAI models will already be mandatory within 12 months after entry into force (expected June 2025). However, for models that have already been on the market for 12 months before the AI Act comes into force, there is a transition period of 36 months (Article 113 b), Article 111 (3) of the AI Act).
What are the implications of this for rightsholders?
When it comes to text and data mining activities, which will certainly play a role in the context of specific AI training techniques, rightsholders will have to become active and cannot expect to be asked for a license as they would usually be when someone wants to use their work. The AI Act delineates this in Recital 105: “Where the rights to opt-out have been expressly reserved in an appropriate manner, providers of general-purpose AI models need to obtain an authorisation from rightsholders if they want to carry out text and data mining over such works.”
As those rights reservations must also be respected also by providers of AI models trained outside the EU, but put into the European market, it may be relevant for international rightsholders, too, to expressly reserve their rights in a manner legally appropriate.
How do rightsholders have to reserve their rights?
Currently, various technical standards are developing, such as the URL-based W3C standard/ TDM Reservation Protocol or the robots.txt Robots Exclusion Standard, but also new asset-based approaches like the ISCC. Both approaches seek to build ‘machine readable’ rights reservations as required under Article 4 (3) of the DSM Directive. Various AI providers as well as providers of training data, have published their standards to allow rightsholders to explicitly authorize or prohibit their AI crawlers, such as OpenAI for ChatGPT, or the Common Crawl CCBot.
Do rightsholders have claims based on the AI Act if a model does not comply?
The AI Act does not provide provisions for direct claims for rightsholders whose rights to copyrighted works have been infringed. However, since the decision QB v. Mercedes-Benz Group AG (EuGH, 21 March 2023 (C-100/21)) individual claims are not excluded if a law also seeks to protect individuals.
The judgment says: “Article 18(1), Article 26(1) and Article 46 of the Framework Directive, read in conjunction with Article 5(2) of Regulation No 715/2007, must be interpreted as protecting, in addition to public interests, the specific interests of the individual purchaser of a motor vehicle. …
Thus, it is apparent from those provisions that an individual purchaser of a motor vehicle has, vis-à-vis the manufacturer of that vehicle, the right that that vehicle not be fitted with a prohibited defeat device, within the meaning of Article 5(2) of that regulation.
Consequently, … EU law must be interpreted as meaning that, in the absence of provisions of EU law governing the matter, it is for the law of the Member State concerned to determine the rules concerning compensation for damage actually caused to the purchaser of a vehicle equipped with a prohibited defeat device, within the meaning of Article 5(2) of Regulation No 715/2007, provided that that compensation is adequate with respect to the damage suffered.”
A similar approach could be taken when AI provider are not respecting the legal limits set by the AI Act and thereby damage is caused to an individual.