Frequently asked questions and common misconceptions

Questions and misunderstandings about personal data in research are common. This page highlights some of the most frequent questions and clarifies common misconceptions.

Frequently asked questions

When do research data contain personal data?

Research data contain personal data when they include information that can directly or indirectly be linked to a living individual. Direct personal information might include names or personal identity numbers. Indirect personal data – information that in itself cannot identify a person, but when combined with other sources, it may identify someone – might include date of birth, place of residence, or occupation. This additional information can be found in records held by another authority, registry holder, company, or private individual. A code key also counts as additional information.

Note that how difficult it is to access potentially identifying additional information or external data sources does not affect whether the data are considered personal data.

When do research data no longer contain personal data?

When all links to living individuals have been removed.

This may be difficult to achieve retrospectively, once the personal data have been collected. It is hard to determine whether data are no longer personal, as there may be additional information in data sources elsewhere that could reveal identities – such as public documents at other government agencies, in public registers, or content online. Laws and regulations may also require that documents that could be used to identify individuals are preserved.

What is the difference between anonymized and pseudonymized data?

Anonymized data are data from which all personal identifiers have been removed so that no data can be linked to a unique individual. These data are considered irreversibly de-identified and thus no longer personal data.

Pseudonymized data, on the other hand, are those where direct identifiers in the material have been replaced by a pseudonym or code. These can only be linked to a unique individual by someone who has access to additional information, for example, a code key. Because that link exists and could be used to re-identify an individual, pseudonymized data are still considered personal data.

While pseudonymization reduces the risk of re-identification, data are not considered anonymized until all links to identifying information have been irreversibly removed – for example, if the code key is permanently destroyed. Aggregation may also classify pseudonymized research data as anonymous if the categories are broad enough (e.g., by grouping exact ages into wide age ranges) to prevent re-identification using additional data sources.

What counts as pseudonymized research data can vary between quantitative and qualitative research. In quantitative studies, pseudonymization typically means replacing names or personal identity numbers with codes and a code key, stored separately from the data. In qualitative studies, like interviews, it might involve replacing names with pseudonyms or using more general descriptions for specific job titles or workplaces to reduce identifiability.

Note that laws and regulations may differ between countries, so it is important to consider the legal and institutional context when managing research data. Contact your institution’s research data support services or Data Protection Officer for advice on how to handle personal data in research.

Can I delete the original data to enable anonymization?

The General Data Protection Regulation (GDPR) includes a principle of storage limitation, meaning that personal data should not be kept longer than necessary for the original purpose. Once that purpose has been fulfilled, personal data should, in theory, be deleted. In practice, this principle is often overridden by archival requirements, which require that data from publicly funded research should be preserved. If you work at a Swedish university or another public research organization, the Swedish Archives Act applies to your material. You may be allowed to delete data if there has been a formal decision that allows disposal (gallringsbeslut), often after a retention period of at least 10 years. Some research data must be preserved unchanged for the future. What can be deleted or preserved is governed by the Swedish National Archives’ regulations and your institution’s local policies. Contact your organization’s archivists for advice.

In summary: If you have collected personal data for research, it is rarely possible to fully anonymize them in the short term, as the original data and any code keys usually need to be retained unless a formal disposal decision has been made and the retention period has passed.

Are research data considered official documents?

Yes, if your research is conducted at a Swedish public authority or another organization subject to the principle of public access to informationOpens in a new tab, research data are typically official documents (allmänna handlingar). Data become official documents if they are held at a public authority, or if they are received, sent, or finalized by the authority. Examples include survey responses, interview recordings, output from laboratory instruments, or register extracts.

What you may or may not do with such research data is governed by laws such as the Public Access to Information and Secrecy Act, the Data Protection Act, the Archives Act, and rules from the Swedish National Archives. You can normally find guidance on how to apply these regulations to your work in your organization’s internal policy documents, for example in its document management plan.

As a general rule, raw data collected by, produced by, or received in a Swedish research project must be retained and preserved as they are official documents. There are additional legal requirements for preserving research data for, for example, audits or investigations into research misconduct. See the question about deleting original data, above.

Research data that are official documents may only be deleted after the mandatory retention period has expired and with a formal disposal decision. Contact your research data support or your organization's archivists to find out what applies to your material.

SND has more information about research data as official documentsOpens in a new tab.

Can I promise research subjects/participants that their data will not be shared?

Not unconditionally. As research data from public authorities are generally official documents, they can be requested under the principle of public access to official documents. Even if research participants have been informed otherwise, each request for access to the research data undergoes a secrecy assessment. If the data are not protected by a secrecy provision, they must be disclosed. The principle of public access to official documents is mandatory and non-negotiable, so you cannot promise that personal data will never be shared.

This does not mean the data will be openly shared; official documents are not automatically public and openly accessible. Data with personal information are often subject to a secrecy provision under the Public Access to Information and Secrecy Act, so any request for disclosure will be reviewed. However, it is neither the researcher nor the research participant who decides if data are confidential – that decision is based on a legal assessment.

Do I need consent from the research participants?

Different types of consent apply to different situations in research. They serve different purposes, so it is important to know what type of consent must be received from participants and what it means that the consent is withdrawn.

Ethical research consent: Most research involving human participants requires voluntary, informed consent in line with research ethics guidelines and good research practice.
Informed consent under law: For example, research covered by section 4 of the Ethics Review Act, clinical trials, or use of biological samples under the Biobanks Act.
GDPR consent: Although consent can be a legal basis for processing personal data under the GDPR, research usually relies on public interest as the legal basis – not consent. Therefore, you rarely need GDPR consent to process personal data in research, but you usually do need ethical consent from participants.

SND has more information about lawfulness and legal basis for processing personal data in researchOpens in a new tab.

What information do I need to provide to the research participants?

In many cases, the data controller must inform participants about how their personal data, and which personal data, will be processed. This right to be informed is a fundamental right under the GDPR. The information must include who is the data controller, the legal basis, and the purpose of the processing.

There are exceptions to the right to be informed – for example, if it is impossible or would require disproportionate effort to inform research participants. This may be the case in register-based research where the researcher has no access to identifiable data and cannot contact the research subjects.

Who can I share my research data with?

It depends on what you want to share and why. Do you plan to share research data with a collaborator outside of your organization or will you deposit data in a repository? The legal considerations depend on your purpose for sharing the data and, generally speaking, public authorities should assess each request for disclosure of the data individually, in line with the Public Access to Information and Secrecy Act.

Research data that contain personal information may not be published openly, unless specific legal exceptions apply. Contact your local research data support team, legal adviser, or Data Protection Officer for advice.

A journal wants access to data supporting my publication – what should I do?

Sharing data that contain personal information with a journal requires the same type of legal assessment as any other request for disclosure of official documents. The request must be reviewed in accordance with the Public Access to Information and Secrecy Act. Your organization's registrar, archivist, research data support team, legal advisers, or Data Protection Officer can help with the process.

Can I share data with a third country outside the EU?

Yes, but a few extra steps are required. First, the same legal assessment must be conducted as for any other data sharing. If the data can be shared, the transfer itself must be secure – for example, not by regular e-mail. Examples of transfers of personal data to a third country include:

E-mailing documents with personal data to recipients outside the EU/EEA;
Using a data processor based outside the EU/EEA;
Giving non-EU/EEA users access, for example reading rights, to personal data stored in the EU/EEA;
Storing personal data in a cloud service based outside the EU/EEA.

Chapter V of the GDPR governs transfers of data to third countries. Always consult a legal adviser or Data Protection Officer to clarify what is permitted.

Does the GDPR apply to data collected outside the EU?

Yes, the GDPR applies if the data controller or processor is based in the EU/EEA, or if the research is intended for individuals within the EU/EEA – even if the data were collected outside the EU/EEA.

If the personal information in my research data is already published, do the data still count as personal data?

Yes. When you process personal data for research purposes, the processing is assessed based on the individual research context. This means that your research counts as new processing under the GDPR. You must have a valid legal basis and specify the purpose of the processing, regardless of whether the data were previously published.

My research data contain personal information, but only about the creators of other works. Can I publish them openly?

Yes, because naming the creators of a work is a legal obligation under the Swedish Copyright Act (SFS 1960:729Opens in a new tab). This provides a legal basis (legal obligation) and purpose for the data processing involved in publishing the information.

Common misconceptions

There is no risk – no-one is interested in finding out who is part of my study

Re-identification can be deliberate, accidental, after data breaches, or by public disclosure. It can happen from curiosity, coincidence, or from an invested interest – for example, in research, journalism, or criminal activity.

How serious the consequences may become depends on the situation. Seemly innocent information, like what car someone drives, can lead to indirect re-identification and disclose sensitive information about someone – for example, their political opinions or sexual orientation – especially if those questions were part of the same survey.

Information that may not appear to be sensitive may contribute to identifying an individual and re-identification can be deliberate or accidental, regardless of whether someone has an interest in the research participants.

Pseudonymization is the same as anonymization

Pseudonymization involves processing personal data in such a way that they can no longer be linked to a specific individual without the use of additional information stored separately from the original dataset. This means that with access to supplementary information (such as a code key), individuals could potentially be identified. Therefore, pseudonymized data are still considered personal data.

Anonymization, on the other hand, involves permanently removing all identifying information from a dataset and irreversibly breaking any link to additional data sources that could potentially identify an individual. Once research data have been fully anonymized, they can no longer be traced back to any specific person and are no longer classified as personal data.

Encryption is anonymization

Encryption uses cryptographic keys – either a shared secret key or a combination of private and public keys – to transform information in a way that reduces the risk of misuse while maintaining confidentiality over a limited period. However, because the transformation is reversible (the data must be decryptable), encryption is not the same as anonymization.

The keys used for decryption are a form of “additional information” (see above) that can make personal data readable and thereby re-identifiable. In theory, you might think that encrypted research data would become anonymous if you erase the decryption key, but this is not necessarily the case. You cannot assume that encrypted data are undecryptable just because the key is said to be “erased” or “unknown”.

Many factors affect the long-term confidentiality of encrypted data, including the strength of the encryption algorithm and key, possible data leaks, implementation flaws, the volume of encrypted data, and future technological advancements. While encryption is not anonymization, it is a useful tool for pseudonymizing research data containing personal information.

Research data can always be anonymized

No. It is not always possible to reduce or eliminate the risk of re-identification to an acceptable level while preserving the usefulness of the dataset for the intended research purpose. Anonymization is about finding a balance between reducing the risk of re-identification and maintaining data utility.

Some qualities in research data, types of data, or specific research contexts make it difficult or impossible to achieve sufficient anonymization – for example, when there are very few individuals with a particular characteristic or variable; when the data types are very distinctive and vary so much between individuals that they can be identified; or when the dataset contains many demographic variables or geographic information.

Anonymization is forever

No. Anonymization and how it is implemented affects the risk of re-identification. While 100% anonymization may be ideal from a data protection perspective, it is not always achievable, so there is often a residual risk of re-identification.

Anonymization is not only about removing direct identifiers from a dataset, but also about breaking links to additional data sources that might enable re-identification. However, contexts change over time. New knowledge, advances in AI, increased computational power, or new applications of existing technologies could make it possible to re-identify individuals from datasets previously thought to be anonymous.

Additionally, future data leaks or the release of new additional data sources could retrospectively compromise anonymity and connect previously anonymous data to identifiable individuals. For these reasons, anonymization may not remain future-proof.

There is no risk of re-identification in anonymized data

The term “anonymous data” should not be understood as a binary concept where data are either anonymous or not. Rather, it exists on a spectrum. Except in specific cases where data are extremely generalised, the risk of re-identification is never zero. Each record in a dataset has a probability of re-identification, depending on how easily it can be distinguished from others. There are methods to assess this risk, and such assessments should be conducted when anonymization is first implemented and followed up over time.

Read more about methods to reduce the risk of re-identification.

Anonymization makes research data useless

The goal of anonymization is to prevent identification of individuals within a dataset. While anonymization techniques may limit how the resulting dataset can be used, this does not make research data useless. How useful the data are depends more on the research purpose and what level of re-identification risk is considered acceptable.

In some cases, anonymization may not be possible due to the research purpose. In such situations, researchers may have to choose between either working with personal data under appropriate safeguards (e.g., by pseudonymizing them), or refraining from processing the data altogether.

An anonymization process that worked for one research project will work for mine

Anonymization processes must be tailored to the nature, scope, and context of the data, as well as to the objectives of the research project. There is no universal, one-size-fits-all solution.

A needle in a haystack or not as difficult as you may think?

It is a common misconception that it is difficult to distinguish a single individual from a large population. As a matter of fact, it takes surprisingly few pieces of information to identify someone, especially if the information can be combined with additional data sources.

Test how many steps it takes to pick out a single individual from a large population.Opens in a new tab