Skip to main content
Researchdata.se

Text

The most common file formats for text processing are widely used, well-known, and broadly supported. As a result, the need to share text files in proprietary formats is relatively low.  

Below, we list a number of file formats, but many others exist. Contact your local research data support service for advice on which file formats are suitable for long-term preservation and sharing of the type of research data you work with.

Most text documents are created using word processing software (e.g., Microsoft Word) or OpenOffice-based programs (LibreOffice, Apache OpenOffice, NeoOffice, etc.). There are also XML-based formats and standards, such as .docx (Office Open XML format) and .odt (OpenDocument format). These two formats are internationally recognized open standards. Both formats are also supported by many other applications, such as Google Docs.

Many documents are saved in platform-independent formats, typically Adobe's Portable Document Format (PDF)Opens in a new tab. PDF is a proprietary format and, therefore, not fully compatible across different implementations. This makes it unsuitable for long-term preservation in archives or similar contexts.  

To improve compatibility across software applications, the open standard PDF/A (research.gov)Opens in a new tab  has been developed. If your files contain embedded images or tables, you should also preserve and share these separately to ensure that details are not lost.

Considerations when creating text files and documents 

When saving documents as PDF/A, remember also to save them in their original format (e.g., Word .docx, OpenOffice .odt). Ensure that fonts and images are correctly embedded. Be sure that the version to be preserved is the final version of the document, without any notes or comments from earlier drafts. In some software, it is only possible to save documents as regular PDFs; in such cases, you will need to re-save the file as PDF/A by selecting ‘Save As’ and choosing the PDF/A format.  

Take care of the following aspects:

  • The hierarchical structure of the document (e.g., different heading levels).  
  • Formatting within the document (e.g., bold, italics).  
  • Page numbering. If a user wishes to cite and reference the document, page numbers must be correct.
  • Save embedded material, such as images and data tables, separately.

Recommended file formats for sharing

  • ASCII (.txt), Unicode (.txt)
  • MS Word (.docx)
  • OpenDocument Text (.odt)
  • PDF (.pdf), PDF/A (.pdf)
  • HTML (.html)  
  • Markdown (.md)  
  • XML (.xml)
  • SGML (.sgml)
  • Rich Text Format (.rtf)

Recommended file formats for long-term preservation in archives or similar

  • ASCII (.txt), Unicode (.txt)
  • MS Word (.docx)
  • OpenDocument Text (.odt)
  • PDF/A (.pdf)
  • HTML (.html)  
  • Markdown (.md)
  • XML (.xml)

For more information on file formats for text, see the ARIADNE guide Documents and Digital Text Documents and digital texts: A guide to good practiceOpens in a new tab.The guides have been developed by SND and translated into English in cooperation with the EU-funded infrastructure ARIADNEOpens in a new tab. ARIADNE is responsible for updating the English guides and keeping them accessible.