Viewing AI as Practices: A Practical Example of Assessing and Managing the Implications of AI for Copyright
In the previous issue of this newsletter, I argue in favor of conceptualizing artificial intelligence (AI) not only as a product but rather as business and organizational practices of deploying software and data to automate task execution and decision-making. Viewing AI in this more holistic way adequately distills AI’s rationale and social imprint, and surfaces its broader legal and ethical implications. This holistic approach in turn supports better risk management, sustainable use and responsible innovation.
These claims probably sounded abstract and might have benefited from a practical example. This issue provides one and illustrates the claims further. A suitable candidate for this purpose appears to be the interaction between AI and copyright, as it spans a variety of legal and ethical issues. In this way, I hope to also kill two birds with one stone by both providing practical guidance but also introducing you to another aspect of AI (as a practice) that requires careful risk assessment and management.
What is copyright?
A brief primer on copyright would help set the scene. Copyright is a suite of rights of authors (such as writers, photographers, coders) of an intellectual creation (such as text, images, code, database) to authorship, attribution, use and control over the use of their works. The analysis here will focus on copyright in creative works in particular as European Union case-law has set a high bar for – and thus limited the scope of – legal protection of databases.
Copyright in the European Union (EU) includes two broader categories of rights. The first one is moral rights. They may vary across EU member states but typically include the creator’s rights to be recognized as an author (i.e. to authorship) and to be credited for respective parts of or the entire work (i.e. to attribution). The second category of rights include the so-called “economic” ones, i.e. the prerogatives of creators or other rightholders (such as creators’ employer, publishers, record labels) to control the use and monetization of the copyrighted work, for example by allowing or prohibiting the reproduction, communication, and/or distribution of the copyrighted work.
These moral and economic rights are exclusive. That means, only the creators and, in certain cases, the other rightholders may use, reproduce, communicate, etc., a copyrighted work. However, third parties may engage in such activities in two broader categories of instances. The first one is when the creator or the other rightholder (collectively referred to as “rightholders” ) transfers, assigns or otherwise grants the respective economic right(s) to the third party, for example via a contractual license. The second category of instances is when a statutory exemption or limitation allows third parties to reproduce, communicate or otherwise use parts of or the entire copyrighted work.
The affordances of EU copyright law
Of particular relevance to AI is the statutory exemption under EU copyright laws of text and data mining. It entails “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations” (TDM). Creation of datasets for AI training, as well as AI model development, are considered to constitute forms of TDM.
Third parties may engage in TDM of copyrighted materials without rightholders’ explicit authorization and compensation in two hypotheses. First, when a research organizations or a cultural heritage institution engages in TDM for scientific research purposes and has lawful access to the copyright material (for example, based on a subscription, another form of license, open access policies). The second hypothesis is when any other entity (including a commercial one) carries out TDM, has lawful access to the copyrighted material as described above, and the rightholders have not opted out from TDM. Such opt-outs are often made in a machine-readable form (for example, via robots.txt files) on websites or in licenses, terms of use, or access policies.
Even if TDM meets all the conditions above, rightholders may still challenge its lawfulness if TDM conflicts “with the normal exploitation of the works” and “unreasonably prejudice[s] the legitimate interests of the rightholders”. See the second paragraph of the next section for concerns to that effect.
To become binding for private parties, these TDM exemption and further modalities must be transposed into the national laws of the various EU member states. No official information on transposition status is publicly available as of date. However, media reports of 2021 suggest of possible significant delays in the process.
Copyright concerns with regards to AI
AI models and systems process vast amounts of data (the so-called “data corpora”) to identify patterns and correlations in them and, on that basis, to generate predictions, recommendations or decisions. Data corpora often contain copyrighted works, such as articles, blogs, books, photo images, etc. If they are scraped or otherwise downloaded without lawful access or despite rightholders’ opt-out, such actions potentially constitute unauthorized extraction and reproduction. They and, under certain circumstances, subsequent AI training and use may infringe EU copyright law. If you are interested in learning how exactly, please email me to receive a detailed briefing paper.
In addition, rightholders have at various occasions (here, here and in a recent lawsuit) aired concerns about the over-exploitation of their creative work and possible substitution of human authors by AI-based productivity tools.
How viewing AI as practices help address its implications for copyright
My (abstract) claims from last time had facets. Here is how this practical example supports them:
Various software may not qualify as AI systems under AI regulations but automate task execution at scale and thus have a significant broader impact. Web crawlers – bots deployed to automatically assess websites content and download data – appear to fall in this category if they exhibit neither autonomy nor adaptiveness.
As such non-autonomous and non-adaptive software tools do not qualify as AI systems in the meaning of the EU AI Act in particular, their developers and users are not bound by any specific risk assessment, risk mitigation or quality assurance requirements under the Act. Developers must not meet any mandatory technical or legal specifications under the Act. If a corporate user would like to procure and use such bots, it should itself evaluate, manage and monitor the operational and legal risks associated with its in-licensing and ongoing application.
If viewed as part of a broader practice of task automation, however, the use of bots constitutes a salient feature of large-scale automatic data analysis and collection in the broader lifecycle of dataset creation and AI development.
Automatic data analysis and collection of copyrighted works fit a neat legal definition — of TDM — under EU copyright law, and arguably must comply with the TDM limited affordances and prohibitions under the implementing national copyright laws of the EU member states. The EU AI Act reinforces these obligations specifically with regards to the developers of general-purpose AI. Yet, these obligations and prohibitions equally bind all other AI developers and deployers engaging in TDM by virtue of the applicable national copyright laws. Even if AI development and use formally meet their requirements, rightholders could arguably still challenge them as disproportionate to normal exploitation and prejudicial to their legitimate interests. Then, AI developers and deployers (e.g. corporate users) would likely need to seek a bespoke authorization from rightholders at appropriate financial terms. Some corporate users would deploy AI without themselves engaging in TDM and training the AI system further. To manage the potential risks of possible copyright infringements, corporate users should seek assurances from AI developers that the procured AI systems are trained on authorized copyrighted materials. These assurances may take the form of, for example, technical and legal audits and contractual representations, warranties and indemnities.
Beyond questions of legal conformity, TDM and generative AI in general appear to raise broader ethical concerns about over-exploitation and potential “hollowing-out” of individual creators’ “agency and autonomy”. To garner social acceptability for TDM and AI, AI developers, corporate users (e.g. a large publisher) and creators would need to resolve these concerns based on a common conception of fairness of use, of compensation and of wider dissemination of the benefits of AI to the society at large. Corporate users of AI, for their part, would likely have to commit to productivity optimization and employment practices that enable and augment — rather than replace — human creativity.
Technical standards might help set a common benchmark for machine-readable opt-outs from TDM and AI training. Beyond that, technical standards hold no relevance to ensuring the legal and ethical conformity and social acceptability of TDM specifically.
If all this still sounds a bit abstract, please email me to chat or receive a detailed briefing paper on AI’s implications for copyright.
This material is for informational purposes only and does not constitute legal advice. Viewing or using this content does not establish an attorney-client relationship. If you need legal assistance, you should seek advice from a qualified attorney.