Skip to main content

Scientific code

This page provides information about how to publish scientific code openly. It summarizes what to consider when writing code, how to prepare the code for publication, where to publish, how to choose a licence, and how to ensure that your code can be understood, reused, and cited.

Why publish scientific code openly?

Scientific code can be used for generating, cleaning, analyzing, or visualizing data. It can also implement models or workflows, or create software packages. Text files written in programming languages are known as source code; in programming languages like R and Python, source code files are sometimes referred to as scripts.

Open-source code is an important part of good research practice and contributes to transparency, reproducibility, and long-term preservation of research results. Code that was not originally developed with the intention of being openly published can still be valuable if it can be read and used by others. In other cases, the research explicitly aims to create open-source software that can be reused and modified by anyone.

Reasons for publishing code openly

  • Transparency: By making the code associated with research data available, you make it easier for others to understand how published data were processed and analyzed. It is part of good research practice.
  • Reproducibility: By publishing data, documentation, and code together, you enable others to recreate, verify, and build on scientific results.
  • Preservation: By publishing and archiving code in a good way, it is safely and securely preserved for the future, both for yourself and others.
  • Citation: You and others can refer to the code using persistent links and cite it in scientific publications.
  • Requirements from journals: You can meet the requirements or recommendations on open sharing of code and data from journals or funding organizations.

Things to consider when publishing scientific code

Scientific code you publish needs to be understandable and reusable by others. It is therefore important to implement good code management. The most important aspects are summarized below.

Choose where to publish your code

Choose a repository that assigns persistent identifiers, such as DOIs, when you publish code, so that you can cite and link to the published code in a sustainable way. Ideally, publish code, data, and documentation that belong together in the same repository. If they are published in different places, make sure they refer to each other with persistent links.

  • If you have a specific version of code and data that is associated with a project or scientific article, you can publish your files together in a research data repository.
    • If you use the Swedish National Data Service’s DORISOpens in a new tab tool, published datasets are assigned a persistent identifier (DOI), and code can be published and preserved together with data and documentation in the dataset.
  • Software that you plan to further develop and maintain can be published in a code repository, such as Gitlab or Github.
  • To make your published code more discoverable, you can register it in a subject-specific catalogue, for example, rOpenSciOpens in a new tab.
  • The Software Heritage infrastructure provides long-term, secure publication and preservation for code that you make available on Github or elsewhere. You get a persistent identifier, a SWHIDOpens in a new tab, which can be used to cite the code.
  • You can also increase visibility for your published software or software package by describing it in a scientific article.
    • There are journals dedicated to descriptions of new software, and some discipline-specific journals where you can publish specific types of articles for this purpose (for example, “software notes” or “software articles”).
    • Writing such an article gives you the opportunity to describe your software in greater detail, and you can encourage users of the software or package to cite the article.

Further reading

Create a sensible file structure
  • Give files and folders descriptive names to make it clear which part of the software does what.
  • Organize files in a logical folder structure. Keep the underlying data, code, and results separate and provide them with separate version numbers. Avoid nesting code and data in folder structures that are difficult to separate, unless required by a specific software standard.
  • Maintain separate versions for code and associated data. You can differentiate versions of files by including the version number in the file or folder name, or by using version control software such as Git.

Further reading

Read more: Folder structure, file names, and versioning.

Structure the code so that it is easy to understand
  • Structure your code to make it easier to read and understand.
  • Divide the code into modules or packages with clear purposes and boundaries.
  • Indent code blocks using tabs and spaces according to the conventions in the programming language you are using, or the style convention you want to follow.
  • Give descriptive names to functions, classes, and variables, to make it easier for others to understand what they contain.
  • Clean up the code before publishing. For example, remove disabled code blocks using comment characters if they are no longer used by the software.
Describe the programming environment in which the code was created

To ensure that your code remains usable over time, it is important to describe the environment in which the code was created. Describe the versions of the operating system, programming language, and any code libraries, modules, or packages used by your software. This can be done in various ways:

  • Include the information in a README file.
  • In R, you can use the sessionInfo() command to retrieve the current information after you have run your program. You can save the output as a text file and publish it together with the code.
  • Alternatively, you can include the information as a metadata file, which can be read by supporting software, such as Pipenv in Python or renv in R.
  • To simplify recreation of the environment needed to run the software, you can use so-called “container technology”. Use a tool, for example Docker, to package the software code together with associated code libraries so the software can be run in isolation in a controlled environment and thus work on different computers.

Further reading

DockerOpens in a new tab is a platform for creating containers.

Document your code and workflow

Good and thorough documentation makes it easier for others to understand how the software is used and what the code does.

  • Include a README file, a plain text file that can be read by a user.
    • In the README file, describe how the software is compiled, run, and used. Include any information that may be useful for a user to know, such as whether there are help commands available.
    • If the published code contains multiple source code files, detail what each file does. Document whether the files are intended to be run as scripts and, if so, if they need to be run in a specific order.
    • Include references to any other scientific publications that the code and analysis methods are based on.
  • Use comments in the code to describe what the code does and why.
    Begin each source code file with a header in the form of a comment that contains:
    • Title.
    • A description of what the file does, and how it relates to other files in the publication.
    • Version.
    • Date.
    • Code publication identifier (e.g., DOI).
    • Identifiers or references to related publications (e.g., an article, report, or data publication).
    • Contact details for the creator (name and e-mail address, as well as, if relevant, ORCID and affiliation.
    • Licence information, if applicable.

One way to document workflows is with so-called “literate programming”. This means that code in a programming language is interspersed with text written in natural (human) language in the file. This is often done using the Markdown markup language for the documentation, allowing code and method descriptions to be mixed in the same file.

  • Using Markdown, it is possible to create reproducible reports with text and analysis results that can then be converted into PDF, Word, or HTML format. Quarto is a popular system for combining code with Markdown, which can be used for literate programming.
  • Positron is an open-source IDE for data science. It is built on Visual Studio Code, which is specifically tailored for R and Python. Positron was developed by Posit, the company that also developed RStudio. It is not based on Markdown, but it has built-in support for Jupyter Notebook, Quarto, and other tools for reproducible and publishable analyses.
  • Another way of documenting software workflows is to use a script that automates the steps in the process.

Further reading

  • QuartoOpens in a new tab is an open-source scientific documentation system that combines Markdown with executable code in R and Python.
  • PositronOpens in a new tab is an open-source data analysis IDE from Posit, which is built on Visual Studio Code and supports R and Python.
  • Jupyter NotebookOpens in a new tab is a commonly used solution for combining executable code and documentation, which supports multiple programming languages, including Python and R.
Show how your code can reproduce your results
  • If possible, provide examples of the output produced by your code and the data you used, from your own programming environment. Others can then compare the output they generate in their programming environment with the results you obtained in yours.
  • Output could, for example, be the figures you used in an article or data tables that form the basis for your results.
Choose a licence that makes your source-code open

Open source means that software is free to use, modify, and share. If possible, avoid making your published code dependent on non-open source software. Before publication, it is important to ensure that you hold the rights to your software in its entirety, and thereby have the right to distribute it.

  • Code that you have created yourself can be provided with a licence specifying how it may be reused. Licences that meet open source requirements are preferred.
  • Attach the licence information as a text file named LICENCE.txt (or LICENCE.md if you use Markdown). Moreover, include copyright information about the licence and copyright holder in the header of the source code file. Alternatively, you can paste the full licence text into the source code file itself.
  • All persons involved in the creation of the code must consent to its publication.
  • When reusing parts of code created by someone other than yourself, you must comply with the reuse conditions specified by their licence. If the code is not provided with a licence that permits re-publication, you will need permission from the copyright holder before publishing your software.
  • Remember to cite the creators when using code that someone else has created.

If code created by someone else does not have a licence that allows re-publication, you must obtain permission from the copyright holder before publishing, since even code copied verbatim from online sources may be copyrighted. Short, standardized code sections are usually exempt from copyright, but longer or more creative parts should be rewritten, used with permission, or published with restricted access.

Further reading

Test and review your code

Make sure that the software runs successfully before publishing your code. It is recommended to test whether the program works as expected on a computer other than your own. If possible, ask a colleague to review and test run your code on their computer.

  • Ensure that all files needed to run the program are included in the publication or that there is information on how to download them.
  • Check that the file paths point to files within the published folder structure, and not to locations specific to your own computer. This is referred to as using relative paths instead of absolute paths.

Example of an absolute path in Windows:

"C:\Users\jdoe\research_project\analysis\article2024\mycode\data\data_file.csv"

Example of a relative path in Windows:

"..\data\data_file.csv" 
  • Some journals require that you make data and code available during the peer review process of an article.
    • If you choose to publish via the Swedish National Data Service’s tool DORIS, you can give reviewers access to data and code files without making them openly available until your article has been accepted.

Do you want to know more?

The information on this page is based on the following sources: