Opinion

Data scraping for AI development – still no progress in the TDM debate

Published Date
Oct 21 2024
It appears that the debate surrounding the UK’s Text and Data Mining (TDM) exception is about to reignite. 

TDM exceptions are key to assessing the legality of training generative AI models, and to resolving the ongoing tension between AI developers’ access to training data and the protection of creative content. In general, these exceptions allow the reproduction of copyrighted content for the purposes of computational analysis and the Hamburg court has recently held that this covers the creation of the AI training dataset.

The current UK TDM exception (s. 29A CDPA 1988) is more restrictive than those in other jurisdictions because it only covers TDM conducted for the sole purpose of non-commercial research. In order to encourage AI and wider innovation in the UK, the previous government announced its intention to introduce a broader “any purpose” TDM exception. However, it later abandoned this plan, following objection from the creative industries. Subsequent stakeholder dialogues to try to agree a voluntary code of conduct also failed.

Conservative MP, Dame Caroline Dinenage, has now published a letter she has written to Lisa Nandy, the Secretary of State for Culture, Media and Sport, and Peter Kyle, the Secretary of State for Science, Innovation and Technology, expressing her concerns about the government apparently revisiting this topic.

She explains how the Culture Media Sport Committee, which she chairs, looked into the issue in some depth during the last Parliament. The 2023 Committee report “Connected Tech: AI and creative technology” concluded that the government should not pursue plans for a broader TDM exemption and should, instead, require AI developers to license copyrighted content, which would provide effective reward for creators. Dinenage’s concern is that the new government is seeking to resurrect the old policy, which she says would remove any incentive for AI companies to devise commercial models that safeguard the incentives and reward for human creativity.

These concerns have been raised after Lisa Nandy recently indicated that a new law may be needed to end AI copyright disputes. Feryal Clark, the Parliament Under-Secretary of State for AI and Digital Government, also told a recent Times Tech Summit that the Government was working through what needs to be done to resolve the issue and to bring clarity to both the AI sector and creative industries. She added that she expects a solution to be proposed by the end of this year. The current speculation is that the government is set to consult on a new TDM scheme that will cover commercial activities but will allow content owners “to opt out” i.e. expressly reserve their rights in certain works.

This is a similar regime to that already in place in the EU and is something of a compromise – covering commercial TDM (not just non-commercial research) but allowing rights holders to reserve their rights, (which wasn’t in the previous proposal). However, publishers have argued these “opt-outs” are impracticable and administratively difficult because it isn’t always possible to know which AI companies are trying to mine their content. They would prefer to retain the status quo, requiring commercial AI companies to agree a licence and pay to use their content.

Dinenage agrees and states that the ultimate solution is not a wide-TDM exception but an obligation on AI companies to be transparent about creative works being used to train their systems so that appropriate commercial arrangements can be put in place. Such transparency is already required in the EU AI Act, which requires providers of certain AI models and systems to provide a publicly-available summary of their use of copyrighted training data. California has also taken a similar step in enacting recent legislation that requires developers of generative AI, in certain circumstances, to post information on their training data on their websites. This needs to indicate, among other things, whether that data includes copyrighted content and whether it has been purchased or licensed.

In the meantime, the debate has not yet (at least publicly) broached the interplay of TDM exceptions with the IP infringement risk that arises for deployers of AI systems. Deployers may be exposed to IP infringement risk where they use AI-generated works that comprise a copyright-protected work from within the training data, and the AI developer had not obtained the relevant rightsholder’s consent for the deployer’s use of that work.

TDM exceptions would not extend to deployers’ downstream use of these AI-generated works. This is part of the wider industry interest in the outcome of not only consultations on TDM exceptions but also the various ongoing copyright infringement proceedings that have been brought against AI developers globally.

We will continue to monitor and to report on these developments.