On December 12, 2024 the ICO published an outcomes report on its 2024 generative AI consultation series (the Report).
The Report addresses five key areas regarding generative AI and its relation to data protection:
- Purpose limitation in the generative AI lifecycle;
- Accuracy of training data and model outputs;
- Allocating controllership across the generative AI supply chain;
- Engineering individual rights into generative AI models; and
- The lawful basis for web scraping to train generative AI models.
The ICO position on purpose limitation, accuracy and controllership, as expressed in the ICO’s calls for evidence, remain the same.
In contrast, the consultation responses have informed a clarification of the ICO’s approach to the lawful basis for web scraped data used to train AI models and the exercise of individual rights in relation to generative AI.
Regarding web scraping, the ICO retains its view that legitimate interest is the only legal basis on which a developer can rely when collecting personal data to train generative AI models in this way and the Report provides explanation as to why other legal bases are not likely to be appropriate. To rely on the legitimate interest basis, the purpose, necessity and balancing tests must be met.
However, the consultation revealed potential challenges that controllers must address:
- Whilst a variety of interests were proposed by respondents, the ICO is clear that specific and clear interests should be identified, even when models may be used for different downstream uses;
- Responses suggested that data collection through means other than web scraping is potentially viable when training generative AI (e.g. directly from individuals, licensed directly from publishers). As such, the ICO will look to developers to demonstrate that web scraping is indeed necessary; and
- Developers may fall foul of the balancing test if data subjects do not have adequate transparency. If data subjects are not aware that their personal data is being collected and used, they may not be able to effectively exercise their rights. The consultation revealed that lack of transparency is an issue with mechanisms and safeguards often being theoretical, and Article 14 requirements not being met. As such, the ICO expects developers to address this concern, for example by providing accessible and specific information as to what personal data has been collected. The ICO also raised the need to consider financial impact on individuals as part of the balancing test.
The ICO did not specifically consider special category data in the consultation but is currently scrutinising its use by generative AI developers based on existing positions.
Regarding engineering individual rights into generative AI models, the ICO’s position again remains largely as indicated in the call for evidence. However:
- Whilst data protection by design and by default is a legal requirement, the ICO is concerned that organisations developing and deploying generative AI models and systems do not enable effective exercise of information rights requests. This is a particular issue for requests relating to web-scraped personal data. Respondent suggestions did not always address the concern, for example the ICO flagged that using output filters does not equate to deletion of data from the model. Rather broadly, the ICO calls on organisations acting as controllers in this context to ensure that systems are designed to implement data protection principles and integrate necessary safeguards into the processing with a view to enabling effective exercise of individual rights;
- The Report emphasised the need for transparency multiple times and whilst the ICO states that it will “continue to engage with stakeholders on promoting effective transparency measures”, it also warned that it will not shy away from “taking action when [its] regulatory expectations are ignored”; and
- The ICO clarifies that whilst it referred to Article 11 UK GDPR in the call for evidence (regarding processing that does not require identification of an individual, such that certain UK GDPR obligations do not apply), organisations can only rely on this provision when justified. For example, organisations must demonstrate that they are not able to identify people and must also give people the opportunity to provide more information to enable identification. The ICO considers that Article 11 should not be relied upon “so broadly”.
Whilst the Report sets out the ICO’s thinking on key areas, the ICO is clear that it is not intended to be a comprehensive assessment of all data protection issues regarding generative AI, that more information is available in the core AI materials section of its website and that upcoming legislation such as the Data (Use and Access) Bill may further impact approach. The ICO will update and consult on its guidance to reflect the changes in law.
The Report also takes the opportunity to address various generative AI, AI and data protection misconceptions that were demonstrated as part of the consultation responses. The ICO clarifies its view in relation to those misconceptions noting:
- The “incidental” or “agnostic” processing of personal data still constitutes processing of personal data;
- Common practice does not equate to meeting people’s reasonable expectations;
- “Personally identifiable information” (PII) is different to the legal definition of “personal data”;
- Organisations should not assume that they can rely on the outcome of case law about search engine data protection compliance when considering generative AI compliance;
- Generative AI models can themselves have data protection implications;
- The ICO cannot determine or provide guidance on compliance with legal requirements which are outside the scope of our remit (i.e. data protection and information law); and
- There is no ‘AI exemption’ to data protection law
The Report is available here.