AI data privacy

Helping organizations prepare and use data for AI while managing privacy and re-identification risks.

Artificial intelligence depends on data. I help organizations determine how personal data can be used responsibly for AI through anonymization, privacy expertise, and independent regulatory guidance.

When you need this

You want to use data for AI

Prepare datasets for machine learning or other AI applications while protecting personal data through appropriate anonymization.

You need datasets suitable for AI

Determine whether your datasets can be considered anonymous before using them to train, validate, or evaluate AI systems and machine learning models.

You need to comply with the AI Act

Understand how GDPR and the AI Act influence the use of personal data within AI systems and support defensible compliance decisions.

You want to share AI datasets safely

Evaluate whether data can be shared with AI vendors, researchers, or development partners while minimizing re-identification risks.

You need independent privacy expertise

Receive independent advice on anonymization, privacy risks, and responsible data use throughout your AI project.

Many organizations do not have in-house expertise to perform this type of analysis and need an independent, objective evaluation.

How I support your AI project

I help organizations prepare and use data for AI while balancing privacy, regulatory requirements, and data utility.

Assess AI datasets

Review datasets intended for AI to determine whether personal data is involved and identify the appropriate privacy approach before development begins.

Evaluate re-identification risk

Assess whether datasets present an acceptable level of re-identification risk for the intended AI use case and data sharing context and therefore can be deemed anonymous.

Design anonymization approaches

Select and apply anonymization techniques that reduce privacy risks while preserving the information needed for AI systems and machine learning models.

Support AI compliance

Provide independent guidance on applying the AI Act, GDPR, HIPAA, and other applicable regulations to the responsible use of data in AI.

Independent expert advice

Support privacy, legal, security, AI, and data teams with objective advice on complex anonymization and data protection questions throughout the AI lifecycle.

What you get

Depending on your project, you receive practical and defensible guidance that enables you to:

  • prepare datasets for AI responsibly
  • reduce re-identification risks before AI use
  • understand whether data can be deemed anonymous
  • support AI governance and documentation
  • demonstrate compliance with applicable regulations
  • make informed and defensible privacy decisions


The outcome is designed to be:

  • technically robust
  • understandable for technical and non-technical stakeholders
  • practical and reusable across future AI projects

Why this matters

Artificial intelligence increasingly depends on large volumes of data, much of which may contain personal information.

Using data without fully understanding the associated privacy risks can lead to:

  • regulatory non-compliance
  • increased re-identification risks
  • unnecessary limitations on AI innovation
  • delays in AI projects
  • loss of stakeholder trust

Independent expert guidance helps organizations prepare data responsibly while balancing privacy, compliance, and AI performance.

Common questions

Can we use this dataset for AI?

That depends on whether the dataset contains personal data and how it will be used. Before data is used for training, validating, or evaluating AI systems, it should first be determined whether the data can be considered anonymous or whether additional anonymization is required.

An independent assessment helps establish whether the dataset can be used responsibly while balancing privacy, regulatory requirements, and data utility.

 
Do we need anonymization before using data for AI?

In many AI projects, the answer is yes.

Training datasets frequently contain personal data or combinations of variables that may still enable individuals to be identified. Before these datasets are used for AI, it is important to determine whether anonymization is required and, if so, whether the chosen approach sufficiently reduces the risk of re-identification while preserving the information needed for the intended AI application.


Will anonymization reduce AI model performance?

Not necessarily.

The objective is not to remove as much information as possible, but to remove or transform the information that contributes to re-identification risk while preserving the characteristics required for the AI model.

Finding the right balance between privacy protection and data utility is one of the key challenges in AI data preparation.


Does the AI Act require anonymization?

The AI Act does not explicitly require datasets to be anonymized. However, it places strong emphasis on data governance, data quality, and compliance with applicable privacy legislation such as the GDPR.

Where personal data is used, organizations should understand whether anonymization is appropriate and how privacy risks are managed throughout the AI lifecycle.


Can we share data with an AI vendor?

Before sharing data with external AI providers, organizations should understand whether the dataset contains personal data, whether anonymization is required, and what re-identification risks remain after transformation.

An independent assessment helps determine whether data can be shared responsibly while supporting contractual, regulatory, and governance requirements.

 

How do we know whether our data is anonymous?

Simply removing names or replacing identifiers is often not enough. Whether a dataset can be considered anonymous depends on the remaining information, the likelihood of re-identification, the available external data, and the context in which the data will be used.

A structured assessment evaluates these factors and determines whether the dataset can reasonably be considered anonymous for the intended use.

 

Can pseudonymized data be used for AI?

Pseudonymized data is still personal data under the GDPR. While it may reduce certain privacy risks, it does not automatically remove regulatory obligations. Depending on the intended AI application, additional anonymization or other safeguards may be required before data can be shared or reused.

Ready to use data for AI?

Let’s determine whether your data can be considered anonymous and prepared responsibly for AI.