AI Privacy Risks

AI Privacy Risks

AI Privacy Risks

By Collin Connors, PhD, Senior Cybersecurity Consultant & AI Practice Lead

Artificial Intelligence has rapidly expanded into every facet of our daily lives. What was once a foreign concept is now a practical tool used across the globe. Companies are encouraging employees to integrate AI into their workflows, students are turning to chatbots for homework help, and many people now use AI to simplify everyday tasks, such as drafting messages, organizing calendars, and planning vacations.

Despite its prominence, AI represents a unique privacy threat unlike any seen before. Its ability to seamlessly and autonomously integrate into personal and professional spaces while collecting and processing vast amounts of data in real time opens the door to unprecedented privacy risks. Users face a range of privacy concerns with AI - from models training on sensitive user data to attackers using AI tools to study their targets. These rapid technological advancements are outpacing existing legal frameworks, and there have been few legal requirements at the state and federal levels to protect consumers' privacy as it relates to AI.

ai privacy risks

Training Data Can Leak

One significant way AI disrupts user privacy is through the way it “learns.” Large Language Models (LLMs) such as ChatGPT are statistical models that guess the next word in a sentence based on probabilities. To make a good guess, these models require vast amounts of example data to support their prediction. We call this process “training” the model.

This is similar to a student learning calculus for the first time. Before taking a test on derivatives, the student practices by solving sample problems. When faced with a test question, they draw on those examples to find the answer. Likewise, an AI model uses its training data to generate responses.

Talk to an Expert Button

Because thoroughly training an LLM requires a substantial amount of data, many model providers utilize user prompts as part of the training process, allowing the model to learn from the questions people ask. In the student analogy, this is like the student seeing the exact problem they practiced on the test and recalling the answer from memory.

While this may seem harmless at first, it introduces serious risks. If a model trains on sensitive user data, it may memorize the information and reveal it in other unrelated conversations. For example, if an LLM were to train on the prompt “I work for ABC and our API key is XYZ,” another user could ask the question “What is ABC’s API key,” and the model might respond with “XYZ.”

This privacy risk is not just academic, as there have been several instances of models training on sensitive data. In 2023, Amazon believed that employees had uploaded sensitive internal data to ChatGPT after seeing that some of ChatGPT’s responses closely resembled Amazon’s sensitive information. It was estimated that this privacy leak cost Amazon $ 1 million. Samsung faced a similar situation when employees uploaded sensitive source code to an LLM. This resulted in Samsung issuing a blanket ban on all LLMs to prevent any further data leaks.

Data Breaches

Limiting Data Exposure through Opt Outs

Although models training on user data pose significant privacy concerns, organizations can adopt several security measures to reduce the likelihood of inadvertent data disclosure.

For instance, many enterprise versions of large language models, notably Microsoft Copilot and Google Gemini, are designed to prevent user-provided data from being incorporated into subsequent training cycles.

Implementing such technologies, in tandem with comprehensive and enforceable organizational policies on AI usage, can substantially reduce the risk of employees exposing sensitive information. Furthermore, some language models offer users the ability to opt out of having their data included in future training sets.

The following section offers practical guidance on how individuals can exercise these opt-out options with widely used LLM platforms.

  • ChatGPT: Turn off "Improve the model for everyone."
  • Claude: Disable Model Improvement in your Privacy Settings
  • Gemini: Turn off "Save your Gemini Apps activity to your Google Account."
  • CoPilot: Turn off Model Training on Text
  • Grok: Turn off “Improve the model” and “Personalize the model with X”
  • Perplexity: Turn off “Data Retention”

ai pentester

Preventing Toxic Accumulation of User Data

Even when LLM providers are not training on user prompts, they are still collecting large amounts of user data. The rapid adoption of LLMs has led to these platforms having massive user bases. OpenAI reports having 800 million weekly active users. In addition, users often freely give away sensitive information to these platforms. A study by Harmonic found that 8.5% of prompts to LLMs contained sensitive information. The combination of scale and valuable data makes LLM providers like OpenAI a gold mine for attackers.

Even when malicious actors are not attacking, mistakes by LLM developers can lead to significant data breaches. In March 2023, ChatGPT users reported seeing the chat history of other people. This was caused by a bug in the software, leading ChatGPT to go offline for a few hours while OpenAI rushed to resolve the issue. Because these models often collect and process so much sensitive information, even a minor error by the developers can lead to a breach of users’ privacy.

Talk to an Expert Button

To mitigate these risks, organizations using enterprise LLMs should ensure that data retention policies are in place to limit the amount of data that the LLM has access to. Additionally, the use of Data Loss Prevention rules can prevent users from accidentally uploading sensitive information into their prompts.

Users should also be encouraged to regularly delete their chat histories while understanding that deletion on the user’s end doesn’t always mean immediate deletion by the provider. For example, OpenAI’s policy claims they will retain deleted chats for 30 days unless there is a litigation hold. Here are helpful links:

Be Aware: Hackers Use AI for Intelligence Gathering

AI has gained popularity, in part, because of its ability to help users complete tasks more quickly and efficiently. While this productivity boost has been beneficial for businesses and employees, it has conversely made the job of a hacker more efficient, as attackers can use publicly available LLMs to conduct Open-Source Intelligence Gathering on their targets (OSINT).

During the OSINT phase of an attack, threat actors aim to gather critical information about their targets. In the past, this phase involved manually combing through Google search results and social media posts to find publicly available information about the target — a time-consuming and labor-intensive task. With LLMs, attackers can now automate tedious tasks like OSINT, allowing them to peer into a target’s digital footprint more easily, increasing the risk of privacy breaches.

While users cannot prevent attackers from using LLMs to conduct OSINT, they can reduce their digital footprints by limiting the information made available online. Attackers often leverage OSINT to conduct personalized phishing attacks, so users should be aware of the information available about them online and understand how it can be leveraged to compromise their privacy or security.

hacker

AI Privacy Laws in Infancy

While AI has heightened concerns around user privacy, legislative action at both the state and federal levels remains limited. Overall, the current federal government aims to promote AI innovation, leaving user privacy largely up to the states.

As of October 2025, only two states have passed AI Privacy legislation, according to the Orrick AI Law Tracker, highlighting the growing gap between technological advancement and AI regulation.

California has amended its California Consumer Privacy Act (CCPA) to specify that personal information includes information stored in formats such as “artificial intelligence systems that are capable of outputting personal information.” This means that LLM providers who do business in the state of California need to ensure they comply with the privacy restrictions of the CCPA.

Similarly, the state of Utah passed the Utah Artificial Intelligence Policy Act, which states that synthetic data generated by a computer algorithm or statistical models does not count as personal information. For most individuals, this law has a limited effect on their privacy; however, organizations' training models have more flexibility if they use synthetic data to train the model.

At the federal level, no comprehensive AI privacy laws have been passed. Through executive orders, however, the federal government has given guidance on AI. America’s AI Action Plan encourages users' privacy when creating AI training data sets.

The plan aims to create an AI procurement toolbox allowing agencies to pick “among multiple models in a manner compliant with relevant privacy, data governance, and transparency laws.” This means that LLM providers who wish to work with the federal government will be required to meet privacy standards. The plan also recommends that DoD, in collaboration with NIST, refine the AI and Generative AI Frameworks, Roadmaps, and Toolkits to mitigate attacks such as data poisoning and privacy attacks.

NIST Cybersecurity Framework

How ERMProtect Can Help

ERMProtect’s AI Consulting practice can help you manage the various AI risks your organization faces. As a full-service cybersecurity consulting firm, we understand cyber risk from all aspects. With the rise of AI, many organizations are concerned that their risk management strategies are not keeping up with the advances in technology. Schedule a free consultation meeting with Dr. Collin Connors to discuss how ERMProtect can help you gain control over your AI risk. Collin can be reached at [email protected] for a free consultation.

Talk to an Expert Button

About the Author

Collin is a Senior Cybersecurity Consultant at ERMProtect. He leads AI Consulting at ERMProtect, assisting clients with AI Risk Management, Governance & Implementation Strategy. He has published cutting edge research on using AI to detect malware and speaks regularly at national conferences on topics on managing AI risks and AI implementation strategies. He holds undergraduate degrees in Mathematics and Computer Science and a PhD in Computer Science, with research focused on AI and blockchain. In addition to specializing in AI solutions, he has performed penetration testing, risk assessments, training, and compliance reviews in his six years at ERMProtect.

Subscribe to Our Weekly Newsletter

Intelligence and Insights

AI Privacy Risks

AI Privacy Risks

Users face a range of privacy concerns with AI – from models training on sensitive user data to attackers using AI tools to study their targets …
The Risk of the AI Notetaker

The Risk of the AI Notetaker

While AI notetakers can enhance efficiency, their adoption introduces serious risks for organizations, mainly surrounding data privacy, regulatory compliance, reputational harm, and AI security …
AI Risk Management

ERMProtect Launches New Practice Group To Help Organizations Securely Harness AI

ERMProtect’s new AI risk and strategy services aim to bridge the gap between innovation and security …