
Resources
Below is a collection of links to guides, handbooks, research articles, and videos that may be helpful when working with data containing personal information.
Guides and handbooks
Anonymisation and personal dataOpens in a new tab
This guide, produced by the Finnish Social Science Data Archive (FSD), provides a basic introduction to anonymization and pseudonymization. It includes practical recommendations for working with both quantitative and qualitative data.
Handbok i statistisk röjandekontroll (Handbook on statistical disclosure control; in Swedish)Opens in a new tab
This handbook from Statistics Sweden (SCB) is primarily intended as a guide for statistical agencies in applying statistical disclosure control when producing and publishing official or other statistics. However, it can also be a useful resource for researchers working with microdata, for example through SCB’s microdata platform MONA. For researchers using microdata, anonymization or de-identification is typically not sufficient – statistical disclosure control is also required before data can be disclosed and shared.
Guide to de-identification methods: Opinion 05/2014 on anonymisation techniquesOpens in a new tab
This opinion from the Article 29 Working Party within the EU (today replaced by the European Data Protection Board), addresses the two main strategies for de-identification, or anonymization: randomization and generalization. It explores specific methods within each category and outlines their respective strengths and weaknesses.
Encryption for researchersOpens in a new tab
Encryption can provide an additional layer of access control for researchers working with sensitive data. This guide, created at Ghent University, explains what encryption is, when to consider using it, and how to apply it in practice.
Data Privacy HandbookOpens in a new tab
Developed at Utrecht University, this handbook for data containing personal information can be seen as the Dutch counterpart to SND's handbook. It focuses both on providing basic legal knowledge and offering an overview of useful methods and tools for researchers working with personal data.
Data Management Expert GuideOpens in a new tab
This general guide to data management is produced by CESSDA, a European research infrastructure working to improve access to social science research data. It is primarily intended for social science researchers and provides best practices and strategies for effective data management, with an emphasis on the FAIR principles. In other words: how can researchers make their data Findable, Accessible, Interoperable, and Reusable?
Anonymisering av personopplysninger (Anonymization of personal data; in Norwegian)Opens in a new tab
This guide from the Norwegian Data Protection Authority (Datatilsynet) is for individuals and organizations seeking support in anonymizing personal data. It covers key legal principles, highlights risk factors, and discusses the pros and cons of various anonymization techniques.
Videos
Practical introduction to the sdcMicro toolOpens in a new tab
A walkthrough demonstrating the sdcMicro tool in RStudio from a CESSDA Train the Trainer Workshop, “Anonymisation for data sharing in practice”. The video shows how to identify variables, or combinations of variables, that pose a risk for re-identification and demonstrates how various aggregations affect that risk.
Data Anonymization Workshop SeriesOpens in a new tab
A series of four workshops from McGill University introducing and exploring anonymization of research data. The first two workshops focus on quantitative research; the two latter have a focus on qualitative research.
Workshop 1: Reducing Risk: An Introduction to Data AnonymizationOpens in a new tab
Workshop 2: ARX – Anonymizing data in theory and practiceOpens in a new tab
Workshop 3: Ethically sharing qualitative dataOpens in a new tab
Workshop 4: Qualitative data sharing: A roadmap and resources to facilitate responsible and ethical data sharingOpens in a new tab
Amnesia: High-accuracy Data AnonymizationOpens in a new tab
This webinar from OpenAIRE serves as both an introduction to anonymizing research data and a demonstration of the Amnesia tool. Amnesia transforms research data containing personal information to provide k-anonymity and km-anonymity.
Anonymisation in theory and practiceOpens in a new tab
Three videos from the British National Centre for Research Methods (NCRM) that offer an introduction and practical guide to anonymization, statistical discloure control, and how to assess disclosure risk in research data.
Five-Step Guide to Statistical Disclosure ControlOpens in a new tab
A series of five videos from the United Nations OCHA Centre for Humanitarian Data. The videos outline the process of assessing disclosure risk in research data
Step 1: Prepare the Disclosure Risk AssessmentOpens in a new tab
Step 2: Selecting Your Key VariablesOpens in a new tab
Step 3: Run the AssessmentOpens in a new tab
Step 4: Read the Assessment ResultsOpens in a new tab
Step 5: Manage Data ResponsiblyOpens in a new tab
Research articles
A tutorial in assessing disclosure risk in microdataOpens in a new tab
Statistical agencies and other public agencies make legal and ethical considerations to improve the confidentiality of data shared with researchers, but there may still be disclosure risks. This tutorial provides an overview of things to consider in assessing and preventing such risks, with a particular focus on quantitative risk measures.
Taylor, L., Zhou, X.-H., & Rise, P. (2018). A tutorial in assessing disclosure risk in microdata. Statistics in Medicine, 37(25), 3693–3706. https://doi.org/10.1002/sim.7667Opens in a new tab
Factors that affect likeliness of survey participationOpens in a new tab
This study used a vignette experiment to investigate how likely research subjects were to participate in surveys with varying topic sensitivity and risk of disclosure.
Couper, M. P., Singer, E., Conrad, F. G., & Groves, R. M. (2008). Risk of disclosure, perceptions of risk, and concerns about privacy and confidentiality as factors in survey participation. Journal of Official Statistics, 24(2), 255–275. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096944/Opens in a new tab
Anonymisation of unstructured dataOpens in a new tab
Much of the literature on data anonymization has focused on structured data (such as tables) used in quantitative research. This study examines two approaches for anonymizing unstructured data – for example, text documents or images – often used in qualitative research. Using two case studies, the study illustrates the challenges encountered when trying to anonymize unstructured datasets using two methods: a risk-based approach, and a strict approach.
Weitzenboeck, E. M., Lison, P., Cyndecka, M., & Langford, M. (2022). The GDPR and unstructured data: is anonymization possible? International Data Privacy Law, (12)3, 184–206. https://doi.org/10.1093/idpl/ipac008Opens in a new tab
Anonymisation of big dataOpens in a new tab
“Big data” refers to large amounts of data that can be analyzed to reveal patterns, trends, and associations. It is increasingly common in the social sciences, for example in studies of online behaviour. This study discusses challenges and issues in researching big data, including anonymization and re-identification.
Weinhardt, M. (2021). Big data: Some ethical concerns for the social sciences. Social Sciences, 10(2), 36. https://doi.org/10.3390/socsci10020036Opens in a new tab
An introduction to synthetic dataOpens in a new tab
Synthetic data are artificially generated fictive data. Instead of modifying an existing dataset to make it less identifiable, a new dataset is generated, containing fictive individuals and values. This study introduces synthetic data by explaining what synthetic data is, why they may be useful, and how to use them.
Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., Maple, C., Cohen, S. N., & Weller, A. (2022). Synthetic Data -- what, why and how? arXiv:2205.03257. https://doi.org/10.48550/arXiv.2205.03257Opens in a new tab