Artificial intelligence and copyright

Services

Artificial intelligence and copyright

For AI system users, the main questions concern the right to use works as input data in an AI system and how to cite the output generated by an AI system.

For developers of AI models and AI systems, copyright issues concern the right to use works as training data for AI models and the transparency obligations regarding the works used as training data.

Relevant AI Act and copyright legislation is further explained in the last section.

1. Using AI systems

The user is best positioned to assess the potential for copyright infringement with the material they input. Therefore, regarding input data for AI systems, the user should ensure they do not ask the AI to create a work that is a derivative of input data containing works owned by a third party. Altering a work for the purpose of making it available to the public requires the consent of the copyright holder. Using AI to, for example, remove, add, or change parts of a work owned by a third party constitutes a derivative work. Examples of such use might include "increasing the mermaid's tail by 30% in this image." Despite these restrictions, copyright law allows the publishing of works for parody, caricature, or pastiche purposes. Deepfakes must be labelled accordingly.

The outputs produced by an AI systems are not considered original works. Only a work reflecting the personality of a natural person, expressed through independent and creative choices made by that person, is considered original. Therefore, the outputs of an AI system can only be protected by contractual terms.

Another frequently raised question is whether AI-produced outputs can infringe on the copyright of existing works. Infringement may occur if AI produces a variation or a direct copy of an existing work used as training data or input for the AI model.

To reduce the risk of copyright infringement, the end user of an AI system should avoid asking the AI system to produce a result that closely resembles a copyrighted work. For instance, OpenAI's DALL-E is designed to reject user requests to mimic the style of living artists. Providers are required to prohibit copyright infringing uses in provider´s terms of use.

It is also in the interest of AI system providers to ensure that users feel confident in using the service without the risk of copyright infringement lawsuits. Microsoft accepts responsibility for copyright infringement claims resulting from the use of Copilot AI (when used according to Copilot’s guidelines).

If a work input into an AI system remains in the AI system for future use, this involves copying, which requires the copyright holder's permission. Check the AI system's terms of use and disable the use of input material as training data unless it is a work you own, a work that is out of copyright (over 70 years from the year author died) or work has the CC0 waiver . If the terms of use require allowing the use of input works such as images for AI training data, you cannot use works owned by third party in the system.

Rule of Thumb: Obtain permission from the copyright holder if the AI system's provider stores a copy of the material you input for the provider's own permanent use.

In addition to the copyright perspective, university guidelines on the use of AI in research and education, the lawful use of personal data, and the requirements of good scientific practice must be considered, such as transparent reporting and referencing concerning the use of AI systems.

Exceptions facilitating the use of works in AI systems, such as the text and data mining exception, do not allow the outputs to be made publicly available. Therefore, outputs produced by AI system using third-party materials, such as those from publishers, cannot be used as teaching materials or in publications. Works protected by copyright and owned by third parties can not not be used as input data in the production images used in educational materials or publications. Your own work, works that are no longer protected by copyright ( year of the death of author over 70 years ago) or works released under the CC0 waiver can be used as input data.

You may use text or images produced by an AI system if the terms of use of the AI system allow it, and if they are cited in accordance with good scientific practice.

Below is an example of citing images that are output of an AI system using the APA referencing style.

In-text citation:(OpenAI, 2023)

Figure title above image: Figure 1. AI generated image of x

Caption note below the image: Note: Image generated by OpenAI, DALL-E, 2023, using the prompt “x”

Reference format in APA-style:

Provider of the AI Model. (Year of the version). Name of the AI model in italics (Version number) [Description of the AI model used]. URL Link to AI provider

Example:

OpenAI. (2023). DALL-E (Version 2) [Artificial intelligence system].

Adobe Firefly. (2024). Image 3 Model (Version 3) [Artificial intelligence system].

For more information on citing images, see

Aalto AI Assistant, the improved UI in Fall 2024

Your personal AI assistant

Services

Guidance for the use of artificial intelligence in teaching and learning at Aalto University.

Services

2. For AI model and system developers

The use of text and data in AI systems and models is regulated within the EU by the text and data mining exception. The EU Directive on Copyright in the Digital Single Market (Directive 2019/790), known as the DSM Directive, includes provisions on text and data mining. The DSM Directive defines text and data mining as an automatic analytical technique aimed at analysing digital text and data to produce information such as patterns, trends, or correlations.

Text and Data Mining for Scientific Research at Universities

The text and data mining exception in the DSM Directive and the Finnish Copyright Act allow for the use of works as training, validation, and testing data for AI models, as well as the use of works as input data for AI systems. Text and data mining can be conducted unless the authors have explicitly and appropriately reserved this right.

If text and data mining is conducted for scientific research at a university or in other research or cultural heritage institutions, rights holders within the EU/EEA cannot contractually prohibit the university's right to text and data mining. An AI system can be used for text and data mining. The AI system used for text and data analysis can also be a service provided by a commercial provider. However, the mining exception cannot be used to grant a third party company providing the AI system the right to use mined works to train the AI model.

The text and data mining exception does not contain the right to publish the mined dataset. However, the dataset may be stored for scientific research and verification purposes. The mandatory exception for text and data mining applies to employees of Aalto University and researchers affiliated with the university. Therefore, the data mining exception benefits not only university employees but also students, emeritus/emerita researchers, and grant researchers. The research must be conducted in collaboration with Aalto University and meet the criteria for scientific research.

General Data Mining Can Be Prohibited

In addition to the exception for text and data mining for scientific research, the DSM Directive and Finnish Copyright Act also include a general exception that allows the use of material for training data and analysis using an AI system. Rightsholders can prohibit this data mining. If it concerns a website, the prohibition to use the site's material for data mining must be expressed using machine-readable means. The European Commission has published a draft code of practice . This code of practice explains on a practical level on how data miners should comply with prohibitions for text and data mining. Websites conducting data mining must comply with the prohibitions expressed using the . The rightsholder can ensure compliance for the reservation of rights when using robots.txt on their website. The EU AI Act General-Purpose AI Code of Practice contains a section that explains the copyright obligations for developers, explaining the copyright and AI Act obligations related to copyright.

Transparency requirements for general-purpose AI models in the AI Act

If a general-purpose AI model, developed in a research project, is placed on the EU market in the course of commercial activities, the provider must create and make publicly available a sufficiently detailed summary of the content used for training the general-purpose AI model in accordance with the model provided by the AI office. This obligation is outlined in Article 53 of the EU AI Act. The EU Commission has published a draft for the template . The template will be ready before the Article 53 obligations for general-purpose AI models will apply 2.8.2025.

General-purpose AI model is defined in the AI Act as an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market;

The AI Act does not apply to AI systems or models developed and deployed exclusively for scientific research. The AI Act does not apply to research, testing, and development activities concerning AI systems or models before their placing on the EU Market or before AI systems are put into service in the EU. However, the AI Act applies to testing AI systems in real-life conditions.

3. AI Act, copyright legislation and AI

Copyright legislation varies somewhat between the United States, United Kingdom and EU. The use of works in Finland as part of the European Union is governed by the Finnish Copyright Act, which implements EU copyright directives.

In the United States, the Copyright Act legislates the the fair use exception. The interpretation of fair use in relation to training AI models, with material from internet sites, is currently being reviewed by US courts, making the interpretation of US copyright legislation still unclear in this regard.

The United Kingdom has a similar exception for text and data mining as the EU.

The guidance presented here regarding copyright and AI follows EU legislation. The AI Act and the General Data Protection Regulation (GDPR) apply to providers and deployers of AI models and systems when they offer AI models and systems in the EU, also when these companies are from outside of the EU. The copyright exception of text and data mining in the DSM Directive and its implementing national copyright legislation is not legislated to limit the freedom of contract of rightsholders from outside the EU. However the effect of the Berne Convention and the AI Act requirement to comply with EU copyright legislation are themes in the ongoing discussion of this question.

Works related to individuals, such as a person's written test answer or a person's photograph, are personal data, and their use is regulated by the GDPR and national data protection legislation. Uses that may be permitted from a copyright perspective may be restricted based on personal data legislation.

The EU AI Act came into force on August 1, 2024, and includes obligations to be followed according to the following transition periods: February 2, 2025, prohibited AI practices and AI literacy obligations; August 2, 2025, obligations for providers of general-purpose AI models, such as generative AI models; and full application of the AI Act by August 2, 2026. Obligations regarding high-risk AI systems embedded in regulated products, will come into force on August 2, 2027. You can read more about the AI Act here Artificial Intelligence (AI): the AI Act and AI literacy at Aalto University | Aalto University .

The AI Act includes transparency requirements regarding the use of works as training data in general-purpose AI models, as well as the obligation to create a copyright policy on how the provider of a general-purpose AI model complies with copyright protection as defined in EU copyright directives, when providing general-purpose AI models in the course of commercial activities. The EU Commission AI office facilitates a code of practice on how to fulfill these requirements in cooperation with leading AI developers, the scientific community and other experts.

The definitions in the AI Act and references to EU copyright legislation facilitate the assessment of copyright issues when using AI. The primary focus of the AI Act is the "AI system," defined as a machine-based system that is designed to operate with varying levels of autonomy and adapts after deployment to produce outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. In addition to AI systems, also general-purpose AI models are defined and regulated.

The AI Act examines different use cases where copyright-protected material is used as data and this guideline examines these uses in research and education. According to Article 3 of the AI Act, "training data" refers to data used to train an AI system by adjusting its parameters, and "input data" refers to data provided into or acquired directly by the AI system, which the system uses to produce an output. The AI Act also includes definitions for validation data and validation datasets and testing data.

This service is provided by:

Legal Services

For further support, please contact us.

Updated: 24.6.2025
Published: 13.1.2025

91�����