
Reuse and cite data
You can reuse research data to augment your own findings, explore new research questions, or to analyze data from multiple sources. To reuse research data, check whether you have permission to use it, and give proper credit to its creators.
What does it mean to reuse research data?
Research data reuse implies using research data produced or collected by others for your own research. By reusing existing data, you can:
- build on previous research
- gain novel insights through meta-analysis, where data from various studies are synthesized to derive broader insights
- obtain reference data for your research, including for “ground-truthing” and calibration
- avoid doing unnecessary experiments and incurring unnecessary costs
- make your research more robust by aggregating results obtained from different methods or samples
- ease the burden on over-researched populations.
Another critical application of research data reuse are replication studies. These aim to reproduce prior analyses using available datasets to verify the accuracy of reported findings. Such replication efforts are essential for ensuring research reliability and are a cornerstone of open science practices.
What should be considered before reusing research data?
You have found data that are relevant to your research and how to access them. Before using them, there are a few things you may want to consider, to avoid potentially having to discard any work you have done with the data.
Check the terms of use
Review the licence or terms-of-use for the data to ensure your intended use is permitted. Research data that have been shared via repositories or websites can usually be reused for research purposes even if they do not have a licence.
Make sure you record the licences or terms-of-use for any data you reuse. Combining licensed data from different sources can introduce challenges later as it may result in a dataset with conflicting licence conditions. However, if you can provide the licences for the data that you reuse and how you have paid attention to the terms of each licence, you are likely to be able to share your data without violating any licence conditions.
For more information, see the Researchdata.se page on Licences.
Conducting research on personal information imposes additional obligations on you as a researcher. Generally, the same rules apply regardless of whether you or another researcher collected the data. If the data you intend to reuse contain sensitive personal information, you must obtain ethical approval before collecting the data.
Read more about research data with personal information in SND's handbook for data containing personal informationOpens in a new tab.
Can you trust the data source?
Assessing whether data are trustworthy is critical. Use these questions to assess whether research data come from a trusted source:
- Is the data source clearly stated?
- Are the data hosted by a reputable agency or trusted research data repository?
- Are the data widely used by researchers?
- Have the data been reviewed or curated?
- Do the data follow established standards?
- Are the data from a dataset with a permanent identifier (e.g., a DOI)?
- Are contact details available for further inquiries?
Assessing data quality and suitability
Assessing whether research data are suitable for your purpose and have sufficient quality is one of the most challenging aspects of data reuse. Some data types, like genome data, are relatively easy to reuse. However, for other data types, such as gene expression experiment data, you will need access to detailed metadata or contextual information about how the data were generated before you can correctly interpret and reuse the data.
You should be able to find the following information about the data from the dataset metadata or an associated research publication:
- why the data were collected/generated
- who collected/generated the data
- how and when the data were collected
- how the data were processed
- the quality assurance procedures that were used.
This information will help you decide whether the data are suitable for your needs.
Citing research data
Acknowledging the original source of any information you use in your research is one of the fundamental pillars of modern research practice, and this extends to data reuse. Citing data acknowledges the original source of the data and recognizes the contributions of the creators. It also promotes research reproducibility by making the data easier to find and access.
When citing a dataset, we generally recommend that you also cite any original research paper that describes the dataset. Not only does this acknowledge the efforts of the data creators, but it also provides a link to vital information about how the data were collected or created. In addition, the terms-of-use may require you to acknowledge the data source and creators.
Data citation in academic papers
Citations for data that you reuse should appear in the paper’s reference list. Data are usually cited in the Data or Methods section of the main manuscript and in the Data Availability Statement. You should list all data sources and state the licenses or terms under which you reused the data. When using downloaded datasets (for example, files downloaded from Researchdata.se or zenodo.org), citing the dataset is sufficient to identify the resource; you do not need to provide links to the individual files in the dataset. If the dataset contains large amounts of data and you only used specific files or variables, this can be explained in the Data Availability Statement.
Data citations and the citation reference list (bibliography)
Most journals prefer you to cite data similarly to how you cite research articles. Always check the journal’s instructions to see which citation style to use, and whether there are any special instructions for citing data. A typical APA-style citation for a dataset is formatted as:
Selfridge, A. R., Spencer, B., Shiyam Sundar, L. K., Abdelhafez, Y., Nardo, L., Cherry, S. R., & Badawi, R. D. (2023). Low-Dose CT Images of Healthy Cohort (Healthy-Total-Body-CTs) (Version 2) [Dataset]. The Cancer Imaging Archive. https://doi.org/10.7937/NC7Z-4F76Opens in a new tab
Online research data sources will often include a suggested citation on the web page, usually formatted as an example bibliography entry. You can copy the suggested citation or transcribe the information into your reference manager.
Many datasets have a Digital Object Identifier (DOI), a persistent identifier which provides for alternative methods for creating a citation reference. You can create bibliography entries for datasets simply by entering the DOIs into a DOI citation formatter, for example DOI Citation FormatterOpens in a new tab or BibGuruOpens in a new tab. If you use a reference manager (e.g. EndNote, Mendeley, Zotero, or Paperpile) you can use the associated browser add-on (e.g. Capture EndNote Reference, Mendeley Web Importer, Zotero Connector, or Paperpile extension) to import the reference information for any dataset with a DOI. Exactly as you would with a journal article, simply visit the dataset’s landing page and run the extension. You should always check the imported information before using it in your work.
Citing data without a DOI
If a dataset does not have a DOI, the dataset’s website will often still have a recommended citation. If you cannot find a recommended citation, cite the dataset in the same way you would cite a generic web page. This is the least reliable way to cite a dataset; not only is information on a website subject to change, but other researchers accessing the same web page may format their citations differently.
In the case of unpublished data, you cannot supply a formal publication date or publisher. However, citing only the name of the individual who provided you with the data is unlikely to enable a secondary user to locate the resource! To provide a basis for verification, include the dataset creator’s organization as the Publisher in the bibliography.