AI - The New Frontier: Through the lens of Data Protection with Liferay

Introduction

"Space: the final frontier..." These words, famously spoken by Captain Kirk in Star Trek, encapsulate humanity's endless pursuit of exploration and discovery. Today, as we stand on the brink of a new technological era, artificial intelligence (AI) represents our current "new frontier." This frontier, much like the uncharted territories of space, offers boundless opportunities but also poses significant challenges, particularly in data protection and risk management. As businesses integrate AI into their systems, it is essential to approach this new frontier with caution, awareness, and a strong commitment to responsible use.

Liferay’s Role in AI Integration

Liferay empowers businesses by enabling seamless integration of AI services. It is important to note that customers have the flexibility to choose the AI solutions that best fit their needs and determine how to use them within their business, ensuring that they align with their specific requirements and risk management practices. Customers must carefully consider the implications of AI, particularly concerning data protection, security, and compliance with emerging regulations like the EU AI Act.

Understanding Generative AI

"Generative" artificial intelligence refers to systems capable of creating content such as text, code, images, music, audio, and videos. These systems, especially those incorporating large language models (LLMs), are often classified as general-purpose AI systems due to their wide range of applications. Generative AI aims to boost creativity and productivity by generating new content or modifying existing content (e.g., offering summaries, corrections, or translations).

However, these systems are not without their risks. Due to their probabilistic nature, generative AI systems might produce inaccurate results that appear plausible, potentially leading to misinformation. Additionally, the development of these systems typically involves training on large volumes of data, which often include personal data. This raises significant data protection concerns that businesses must address when deploying such technologies.

Pre-Trained Models vs. User-Trained Models: Key Milestones in Selection, Training, and Deployment

AI models are offered in two forms: pre-trained models, ready to use and usually hosted by the vendors, and customer-trained models, where businesses take responsibility for training and eventually hosting the AI. Each model type has different requirements and risks under data protection principles (like GDPR, LGPD, etc.) for personal data, and also raises distinct Intellectual Property (IP) considerations, such as ownership of AI-generated outputs and the ownership of training data. In the following sections, we will explore the key milestones—selection, training, and deployment—for each model type, highlighting the associated challenges and considerations at each phase.

1. Vendor Selection and Initial Risk Assessment

Pre-Trained Models	User-Trained Models
Vendor Compliance: Vendors ensure AI models are trained in compliance with relevant regulations like GDPR, CCPA, LGPD, and PDPC (Singapore). They often use legitimate interest as a legal basis for training models, particularly when using public data.	Compliance Responsibility: The business plays a key role in ensuring compliance during training. It's important to select a suitable legal basis, such as legitimate interest or contractual necessity, for processing data, and to align this approach with global data protection regulations.
Security Measures: Vendors should offer security assurances for the handling of training data, adhering to guidance such as that provided by PIPC (South Korea)¹. on data minimization and encryption. In addition, authorities like the ICO (UK)² ³ and CNIL (France)⁴ emphasize the importance of encryption and access controls to safeguard the confidentiality and integrity of personal data throughout the AI training process. They further stress the need for anonymization where applicable and secure data storage to ensure ongoing compliance with global data protection standards.	Security Measures: It is essential that platforms used for AI training implement robust data protection measures, such as encryption to secure data during both transmission and storage, and data isolation to prevent unauthorized access. Platforms should also follow established global standards, including those from the ICO (UK) and PIPC (South Korea), by enforcing strict access controls and secure handling of personal data. Ensuring that the platform's data protection practices align with relevant regulations, such as GDPR and LGPD, will help safeguard sensitive and proprietary information throughout the training process.
Data Processing Agreement (DPA): DPAs are mandatory in jurisdictions like GDPR and LGPD, but not always required under laws like CCPA (USA). Still, it’s a best practice to have a DPA to clarify data protection obligations, especially for cross-border data transfers. Vendors acting as data controllers—for instance, if they use customer data for purposes like improving AI—should detail these roles in the DPA.	Data Processing Agreement (DPA): To the extent the pre-trained Model shall be hosted by a third party, the DPA will be of an importance. Though not always mandatory (e.g., USA, Japan), a DPA is recommended to define retention, security, and deletion policies. In GDPR and LGPD jurisdictions, DPAs are compulsory when vendors act as data processors. For user-trained models, vendors typically follow the customer’s instructions, but DPAs are crucial to clarify compliance, especially for cross-border transfers.
IP Ownership and Liability: Vendors typically retain ownership of the model but must clarify the legal basis of the training data, ensuring compliance with IP laws. Contracts should contain indemnity clauses to protect Businesses against potential IP violations.	IP Ownership and Liability: For businesses training their own AI models, it's important to clarify the ownership of IP rights related to the training data and outputs. It’s advisable to consider including indemnity clauses in the license terms for the training data to help mitigate the risks of potential IP disputes.

2. Training (Data Minimization, and Compliance Setup)

Pre-Trained Models	User-Trained Models
Data Minimization: Vendors should ensure data minimization during training. This means only using the necessary data for AI model training, in compliance with the applicable data protection laws, such as GDPR and LGPD standards. The PIPC (Korea)⁵ emphasizes minimizing sensitive data use to reduce risk.	Data Minimization: Businesses should minimize collection of personal data for training, using only what is necessary. Mitigating measures such as use of anonymisation or pseudonymisation shall be considered. This is particularly critical in sensitive use cases such as HR. Compliance with GDPR and other local data protection laws (e.g., PDPC (Singapore)) is required.
Legal Basis Selection: Legitimate interest is often used for processing large datasets like scraped data, but different jurisdictions handle this differently. For example, ANPD (Brazil⁶ ⁷. required Meta to obtain explicit consent for using public data for training, while ICO (UK)⁸ ⁹. and CNIL (France)¹⁰. permit legitimate interest for public data under specific conditions, as long as the principles of transparency and purpose limitation are respected.	Legal Basis Selection: In many jurisdictions, consent is viewed as generally unsuitable in cases of employer-employee relationships due to the power imbalance, which can make it involuntary. Instead, businesses typically rely on legitimate interest or contractual necessity for processing employee data, but this requires a Legitimate Interest Assessment (LIA) to ensure that the processing doesn't unfairly impact employee rights. Transparency is still necessary.
Scraped Data and Diverging Views: There are significant differences globally on the legality of using scraped public data for AI training. The ANPD (Brazil) is stricter, requiring consent, while in Europe, legitimate interest is often relied upon, with conditions on transparency and purpose alignment.	DPIA and Risk Management: For customer-trained models, the business has to conduct a DPIA when handling high-risk data processing, such as profiling or automated decision-making likely in HR context. Risks such as bias, discrimination, and privacy breaches should be identified and mitigated. CNIL (France)¹¹ ¹² and PIPC (South Korea)¹³ ¹⁴ require DPIAs for such high-risk activities to ensure data subject rights are respected.
Retention and Archiving: Vendors need to provide clear data retention policies. For example, under FTC (USA)¹⁵ ¹⁶ ¹⁷ ¹⁸. guidance, transparency about retention periods is essential. Vendors may keep training data for audits before deletion, but compliance with GDPR’s storage limitation principle must be ensured.	Retention and Archiving: For customer-trained models, the business defines retention policies. Data should be deleted or anonymized after the training process, unless it is necessary to retain the data for audits or retraining purposes. GDPR mandates minimizing retention to reduce risks of data breaches.

3. Deployment (Monitoring, and Ongoing Compliance)

Pre-Trained Models	User-Trained Models
Legal Basis for Deployment: The vendor typically defines the legal basis for data used as input and output, often relying on legitimate interest or performance of a contract. However, it remains important for the business to ensure that the AI's use complies with GDPR and other regulations, particularly regarding transparency and data protection. Particularly, data protection notices of the business might require an update.	Legal Basis for Deployment: It's important for the business to select an appropriate legal basis, such as legitimate interest or contractual necessity, for using personal data as input and the AI outputs. Additionally, reviewing and updating privacy policies, while ensuring transparency for data subjects, can help maintain compliance with relevant regulations.
Transparency and Data Subject Rights: It's important for the business to maintain transparency about how personal data is processed by the AI. Handling Data Subject Access Requests (DSRs) effectively, particularly in HR or customer-facing AI applications, can help ensure compliance with standards from authorities like CNIL (France)¹⁹ ²⁰. and FTC (USA)²¹.	Transparency and Data Subject Rights: Data subjects should be clearly informed about how their personal data is being processed. Organizations should establish clear processes for handling Data Subject Requests (DSRs), ensuring that individuals can exercise their rights to access, delete, or modify their data in accordance with applicable data protection regulations.
Monitoring and Bias Management: Pre-trained models require periodic auditing to ensure the outputs remain compliant with GDPR’s fairness and transparency principles. Vendors should also be audited for ongoing compliance with PIPC (Korea) and ANPD (Brazil) standards.	Monitoring and Retraining: It’s important for the businesses to continually monitor the performance of the model and consider retraining when new data is introduced. Regular monitoring can help ensure that the model avoids introducing biases or producing discriminatory outcomes.
IP Ownership and Outputs: Vendors typically own the AI model, but businesses should clarify ownership of the AI’s outputs (e.g., insights, reports). Indemnity clauses protect against IP claims related to the model’s use.	IP Ownership and Outputs: The business owns the trained model and any outputs it generates. It’s important that contracts clearly define ownership rights over both the model and its outputs, especially if third-party platforms were used during training.

Conclusion

Just as Starfleet is committed to exploring space responsibly, Liferay is dedicated to fostering a responsible AI ecosystem. Our Responsible AI program focuses on ensuring that AI technologies within the Liferay platform are developed and maintained ethically and in compliance with regulatory standards. While this program aims to promote responsible AI practices, its scope is centered on Liferay’s own AI initiatives. Although we don’t provide specific guidance, checklists, or ongoing support for customers' AI integrations, we remain committed to delivering a robust and secure platform that empowers customers to make informed decisions about their AI implementations.
AI is indeed the new frontier—a vast, uncharted territory filled with potential but also fraught with risks. As Liferay customers embrace the potential of AI, they are also encouraged to actively manage the associated responsibilities, ensuring a balanced and compliant approach. By conducting thorough risk evaluations, establishing a lawful basis for data processing, performing DPIAs, and carefully crafting contractual agreements, businesses can harness the power of AI while safeguarding personal data and complying with regulations.
Much like the voyages of the USS Enterprise, this journey into AI requires courage, foresight, and a commitment to ethical exploration. Liferay’s role is to empower customers with the tools they need, but the responsibility for ethical AI use ultimately lies with those who choose to integrate it and control the use. As we venture into this new frontier, remember that the key to successful AI integration is not just in the technology but in the governance that surrounds it.

https://iapp.org/news/b/south-koreas-pipc-helps-conduct-personal-info-impact-assessments
https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/how-should-we-assess-security-and-data-minimisation-in-ai/
https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/
https://www.dataguidance.com/opinion/france-cnils-ai-how-sheets-overview
https://securiti.ai/south-korea-safe-use-of-personal-information-in-ai/
https://www.dataguidance.com/news/brazil-anpd-temporarily-suspends-metas-use-personal7
https://brazilian.report/liveblog/politics-insider/2024/07/02/meta-stop-mining-brazilians-data-for-ai/
https://www.pinsentmasons.com/out-law/news/generative-ai-underlines-data-scraping-compliance-risks
https://www.itpro.com/technology/artificial-intelligence/generative-ai-training-in-the-crosshairs-as-ico-set-to-examine-legality-of-personal-data-use
https://www.cnil.fr/en/legal-basis-legitimate-interests-focus-sheet-measures-implement-case-data-collection-web-scraping
https://mgsi.lu/language/en/cnil/cnil-list-of-processing-for-which-a-dpia-is-required/
https://www.twobirds.com/en/insights/2018/france/la-cnil-vient-de-publier-au-jorf-la-liste-des-traitements
https://iapp.org/news/b/south-koreas-pipc-helps-conduct-personal-info-impact-assessments
https://iapp.org/news/b/south-koreas-pipc-offers-guidelines-on-personal-data-for-ai
https://www.hoschmorris.com/privacy-plus-news/data-minimization-and-retention
https://www.ftc.gov/news-events/news/press-releases/2024/03/ftc-releases-2023-privacy-data-security-update
https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/01/ai-companies-uphold-your-privacy-confidentiality-commitments
https://www.dataguidance.com/opinion/usa-how-ftc-influencing-ai-regulation
https://www.cnil.fr/en/topics/artificial-intelligence-ai
https://www.cnil.fr/en/ai-cnil-publishes-its-first-recommendations-development-artificial-intelligence-systems
https://www.ftc.gov/business-guidance/blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai