Guides for Researchers
Data formats for preservation
What you need to know when creating a DMP
The context
Research data and other research outputs - such as datasets, software, models, workflows and documentation - are created, processed, shared and preserved in different formats. In a DMP, file formats should be planned early because they affect interoperability, machine-readability, reuse, repository deposit and long-term preservation.
Under Horizon Europe, beneficiaries must keep an updated DMP for research data generated or collected by the project, following the principle “as open as possible, as closed as necessary”. The DMP should explain the data lifecycle, including how data will be managed, shared, preserved and deposited in a trusted repository.
A good DMP should distinguish between working formats, used during collection and analysis, and preservation formats, used for deposit, access and long-term reuse. Proprietary or software-specific formats may be necessary during research, but they can create risks such as software dependency, format obsolescence, limited interoperability and reduced accessibility. Where possible, choose open, standard, well-documented and community-accepted formats. Science Europe specifically recommends describing the kinds, formats and volumes of data, justifying format choices, and giving preference to open and standard formats because they support sharing and long-term reuse.
Preservation also requires attention to lossless formats, migration, normalisation, versioning, metadata and fixity. Lossless or non-destructive formats are preferable when data quality must be retained, while lossy formats may be suitable only for access copies, visualisation or dissemination. The Digital Preservation Handbook highlights file-format risks such as obsolescence and proliferation, and stresses that format choices should be part of a wider preservation strategy.
Finally, a file is not preserved simply because it is stored. Long-term reuse also depends on Representation Information - the documentation needed to understand the file - and Preservation Description Information, such as provenance, context, access rights and fixity information. These OAIS concepts help ensure that data remain understandable to future users, not only technically accessible.
![]()
Did you know?
FAIR does not automatically mean preserved.
FAIR supports findability, accessibility, interoperability and reuse, but preservation also requires active curation, trusted repositories, metadata, checksums, access policies and sustainability planning.
DMPs are living documents.
They should be updated as formats, tools, repositories, responsibilities or preservation needs change during the project.
Preservation starts at data creation.
Good preservation depends on early decisions about formats, documentation, storage, backup, version control, licences and repository requirements.
|
Why is it necessary?
Choosing the right file formats is essential for long-term access, interoperability, authenticity and reuse. In a DMP, this means explaining not only what formats will be used, but also why they are suitable for sharing, preservation and future re-use.
File formats can create preservation risks. Proprietary, software-specific or bespoke formats may depend on particular tools, licences, hardware or vendor support. If these environments change, files may become difficult to open, validate or interpret. Digital preservation guidance highlights format obsolescence and format proliferation as key risks: too many unsupported formats and versions increase the cost and complexity of preservation.
For this reason, DMPs should prioritise open, standard, well-documented and community-adopted formats where possible. Science Europe recommends describing the kind, format and volume of data, justifying format choices, and giving preference to open and standard formats because they support sharing and long-term reuse. CESSDA similarly recommends standard, open and widespread formats for long-term storage, such as CSV, PDF/A, TIFF, XML, PNG or ODF, depending on the data type.
Format choice also affects data quality and preservation integrity. Lossless formats are preferable for archival masters because they retain the full content, while lossy formats are better suited for access copies, visualisation or dissemination. The Digital Preservation Handbook notes that using a lossy format as both the access and archival version can lead to irretrievable data loss.
Preservation is also about proving that files remain unchanged and trustworthy. Fixity checks, such as checksums, help verify that files have not been altered or corrupted during transfer, storage or access. They support authenticity, chain of custody and data integrity over time.
Ultimately, format decisions are strategic. They help prevent data loss, reduce dependency on unstable technologies, support FAIR and Open Science expectations, and make research outputs easier to preserve, understand and reuse beyond the project.
![]()
Did you know?
In Horizon Europe, a DMP is not just an administrative template. It is a planning tool for the full research data lifecycle, including how data will be organised, curated, accessed, preserved, shared and, where relevant, deleted. This means that file formats should be planned early, justified clearly, and aligned with FAIR principles, trusted repositories and long-term reuse.
How to deal with this?
Plan file formats as part of the DMP from the start. Do not only record the format used during analysis; distinguish between working formats, exchange formats, and preservation/deposit formats.
Where possible, use formats that are open, standardised, well documented, non-proprietary, widely adopted, machine-readable and accepted by the target repository. Science Europe recommends that DMPs describe the kind, format and volume of data, justify format choices, and give preference to open and standard formats because they support sharing and long-term reuse.
If proprietary or software-specific formats are needed during data collection or analysis, plan an export or conversion route early. CESSDA recommends converting data into long-term preservation formats, but also warns that conversion may cause loss of information, such as missing labels, altered data types, truncated values, loss of formatting, reduced image resolution, or lower audio quality. Conversion should therefore be checked by someone who understands the data.
Use repository guidance before finalising the DMP. The UK Data Service provides recommended and acceptable formats for sharing, reuse and preservation, and notes that researchers may need to convert files to preservation formats. The National Archives also maintains a file-format transfer list linked to PRONOM Unique Identifiers (PUIDs) and recommends using PUIDs when cross-referencing DROID file-format identification outputs.
In the DMP, state clearly:
- which formats will be used during data collection and analysis
- which formats will be deposited or preserved
- whether conversion, normalisation or migration is needed
- what information could be lost during conversion
- which metadata, codebooks, README files, scripts or software are needed to understand the files
- which repository requirements or preferred-format lists have been checked
- Data description and formats. 4TU.Centre for Research Data
- File formats. DANS - Data Archiving and Networked Services
As an example, the following table describes a variety of file formats for different disciplines that are either recommended or acceptable (from the UK Data Service):
| Type of data | Recommended preservation / deposit formats | Other acceptable or working formats | DMP note |
| Tabular data with extensive metadata |
Delimited text plus command/setup file; structured metadata such as DDI XML; SPSS portable .por | SPSS .sav, Stata .dta, SAS .sas7bdat, MS Access .mdb/.accdb | Preserve variable labels, value labels, missing values and codebooks |
| Tabular data with minimal metadata | CSV .csv; tab-delimited .tab; delimited text with SQL data definition statements | TXT .txt; Excel .xls/.xlsx; OpenDocument Spreadsheet .ods; Access .mdb/.accdb | Use clear delimiters, UTF-8 encoding and separate documentation. |
| Geospatial data | ESRI Shapefile .shp/.shx/.dbf; GeoTIFF .tif/.tfw; GML .gml; tabular GIS attributes | Geodatabase .mdb; MapInfo .mif; KML .kml; CAD .dwg/.dxf; SVG .svg | Keep projection, coordinate system and georeferencing metadata. |
| Textual / qualitative data | XML with appropriate schema; RTF .rtf; plain text .txt; HTML .html | Word .doc/.docx; QDA exports from NVivo, ATLAS.ti, MAXQDA | Export raw data, coding tree, coded segments and memos where relevant. |
| Image data | TIFF 6.0 uncompressed .tif; DICOM .dcm for CT/MRI data | JPEG .jpg/.jpeg only if originally created in that format; PNG .png; RAW .raw; Photoshop .psd; PDF/A .pdf | Prefer lossless or non-destructive formats for preservation masters. |
| Audio data | FLAC .flac; WAV .wav | MP3 .mp3 only if originally created in that format; AIFF .aif | Preserve high-quality master files; use compressed versions mainly for access. |
| Video data | MPEG-4 .mp4; OGG video .ogv/.ogg; Motion JPEG 2000 .mj2 | MOV .mov; WMV .wmv; WebM .webm; AVCHD .avchd | MOV .mov; WMV .wmv; WebM .webm; AVCHD .avchd |
| Documentation and scripts | RTF .rtf; PDF/A or PDF .pdf; HTML .htm; OpenDocument Text .odt; plain text | Word .doc/.docx; Excel .xls/.xlsx; XML with appropriate schema | Include README files, codebooks, workflow notes, software versions and dependencies. |
| Software, code, models and workflows | Plain text source code; open scripts; documented workflow files; container or environment documentation where appropriate | Proprietary workflow or notebook formats, if required by the research environment | Preserve enough documentation for future users to run, inspect or understand the output. |
This table is adapted from the UK Data Service’s current recommended-format guidance, which covers formats for data sharing, reuse and preservation.
|
Video:
Utrecht University. “Preserving research data in the optimal, technically correct way” (How to minimize the risk of losing data. Here you’ll learn which methods there are to preserve your research data in an optimal way)
Resources
Core publications and guidance
- Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
- Tommaso Boccali, Anne Elisabeth Sølsnes, Mark Thorley, Stefan Winkler-Nees, & Marie Timmermann. (2021). Practical Guide to Sustainable Research Data. https://doi.org/10.5281/zenodo.4769703
- Science Europe. (2021). Practical Guide to the International Alignment of Research Data Management: Extended Edition with DMP Evaluation Rubric. Science Europe. https://doi.org/10.5281/zenodo.4915861
- Digital Preservation Coalition. (2014). The Open Archival Information System (OAIS) Reference Model: Introductory Guide, 2nd edition. DPC Technology Watch Report 14-02. https://doi.org/10.7207/twr14-02
- Digital Preservation Coalition. (2015). Digital Preservation Handbook. Digital Preservation Coalition.
- CESSDA Training Team. (2017–2019). CESSDA Data Management Expert Guide. CESSDA ERIC. https://doi.org/10.5281/zenodo.3820473
File formats and data preservation resources
- The National Archives. File formats for transfer. The National Archives. https://www.nationalarchives.gov.uk/information-management/manage-information/digital-records-transfer/file-formats-transfer/
- UK Data Service. Format your data: Recommended formats. UK Data Service. https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/recommended-formats/
- Stanford Libraries. (n.d.). Best practices for file formats. Stanford University.Name files - Data best practices and case studies - Guides at Stanford University
- DANS. File formats. Data Archiving and Networked Services.
- Utrecht University. Storing and preserving data. Utrecht University.Storing personal data | Data Privacy Handbook
- CORDIS - EU research results : Advancing long-term digital preservation of scientific data
- Data storage and Backup and Versioning https://courses.openlearnity.org/courses/course-v1%3Astorage%2Bbackup%2Bversioning/about
- Storing, Backing Up, and Versioning Data https://libguides.brown.edu/DataManagement/storage
- Research data management: Data Storage and Preservation https://libguides.singaporetech.edu.sg/c.php?g=925148&p=6740199