Guides for Researchers
How to deal with non-digital data
The benefits of digitising data
What it is?
Not all research data are digital. Research data include both digital and non-digital (analogue or physical) materials, such as handwritten laboratory notebooks, journals, surveys, paintings, fossils, minerals, biological samples, and other physical objects used as evidence in research. European and university guidelines (e.g. KU Leuven, Science Europe) explicitly recognise these materials as part of the research data ecosystem and subject to Research Data Management (RDM) practices.
However, non-digital data can be converted into digital form in a variety of ways. Digitisation (e.g. scanning, imaging, transcription, or 3D modelling) enables these materials to be documented, shared, and preserved more easily, supporting FAIR principles where possible.

At the same time, not all non-digital data can or should be fully digitised. In such cases, their management relies on proper documentation, metadata, and preservation strategies (e.g. storage conditions, biobanking, or physical archiving), ensuring traceability, accessibility, and long-term usability.
Within the European RDM and EOSC context, non-digital data are therefore managed through a combination of digitisation, metadata documentation, and lifecycle planning, recognising that preservation extends beyond digital files to include physical materials and their associated context.
Why digitise data
- Increased efficiency: With a well-defined digitisation and document imaging plan, stakeholders can easily share, collaborate, compare, exchange, and access documents. Digitisation supports more efficient research workflows and enables integration with digital research infrastructures and services (e.g. EOSC), reducing duplication of effort and time.
- Cost efficiency: The cost of printing and managing physical paperwork can be excessive. Digitisation reduces long-term storage, handling, and administrative costs, especially when combined with structured data management planning.
- Ease of access: Digitised objects can be easily accessed through cloud or network services using any internet-enabled device, anytime and anywhere. This enhances accessibility and supports Open Science practices, enabling broader reuse and dissemination of research outputs.
- Security: File permissions and access rights to specific users or groups can be defined if needed. This increases security and maintains confidentiality. In addition, digital systems allow controlled access, versioning, and traceability, which are essential elements of responsible data management.
- Long term preservation: Physical objects and information stored on media such as paper are subject to degradation, especially through handling. Digitisation enables the creation of preservation copies, supporting long-term access; however, European guidance emphasises that preservation requires ongoing management and is not guaranteed by digitisation alone.
- Data recovery: In the event of natural or man-made disasters, loss or damage of physical objects can occur easily. Digital copies stored in managed infrastructures, repositories, or trusted services reduce this risk through backup, redundancy, and distributed storage solutions.
- Interoperability and FAIRness: Digitised data can be described with metadata, assigned persistent identifiers, and integrated into FAIR-compliant systems, increasing their discoverability and reuse across disciplines and platforms.
Digitising versus born-digital
Golden copy
The master version of any data is typically referred to as the golden copy and represents the record of the highest quality. In Research Data Management (RDM), the golden copy is the authoritative, complete, and most reliable version of a dataset or record, from which all other versions are derived.
The golden copy should be clearly identified and maintained throughout the data lifecycle, ensuring consistency, traceability, and integrity across all stages of data creation, processing, and reuse.
Within FAIR and EOSC frameworks, the golden copy is closely linked to good practices such as version control, persistent identifiers (PIDs), and metadata documentation, which enable users to distinguish between original, processed, and derived data.
The golden copy of a record exists for all stages of its development. This implies that, at each stage, there should be a recognised “best available version” that is preserved, documented, and, where appropriate, shared or archived in a trusted repository.
Maintaining a clear golden copy reduces the risk of data inconsistencies, supports reproducibility, and ensures that research outputs remain reliable and reusable over time.Retention period
Typically, basic research data and related material should be retained for a minimum of 10 years after the study has been completed. European university guidelines (e.g. KU Leuven) commonly recommend retaining research data for at least 10 years to support verification, reproducibility, and future reuse of research findings.
Clinical research data should be preserved for longer periods, often 15–25 years or more, depending on regulatory requirements, particularly when linked to health data, trials, or legal obligations.
Retention periods are not fixed and depend on multiple factors, including disciplinary norms, funder requirements, legal obligations, and the potential value of the data for reuse. For example, some universities recommend shorter minimum periods (e.g. 5 years), while others require longer retention for specific types of research data.
Please keep in mind that national legislation or codes of conduct may impose different periods. In particular, under GDPR and institutional policies, personal data should not be kept longer than necessary, and retention periods must be justified, documented, and regularly reviewed.
Therefore, retention decisions should be defined early in the Data Management Plan (DMP), ensuring compliance with legal, ethical, and disciplinary requirements, while balancing long-term preservation with data protection obligations.
.
Text digitising
Anything stored on paper can be scanned and converted into digital form. For best results, documents should be digitised as high-quality image files (e.g. TIFF) or as PDF/PDF-A files, which are widely recommended formats for long-term access and preservation.
Using PDF formats allows files to become searchable. Optical Character Recognition (OCR) is a process that converts text contained in images into editable and searchable text files. OCR enhances accessibility and reuse by enabling full-text search, extraction, and integration into digital workflows, although its accuracy depends on scan quality and document condition.
To ensure high OCR accuracy, documents should be scanned at sufficient resolution (typically around 300 dpi or higher), with good contrast, proper alignment, and minimal distortion. Lossless formats (e.g. TIFF, PNG, high-quality PDF) are preferred, while lossy formats (e.g. JPEG) should be avoided as they reduce text clarity.
If you are scanning sensitive data, it is recommended to use secure storage practices. Sensitive information should be handled in compliance with data protection requirements, ensuring secure transfer, controlled access, and appropriate deletion of temporary storage media. OCR systems can also be used to detect sensitive information within scanned documents, supporting data protection and compliance workflows.
Overall, text digitisation combines technical quality (scanning and formats), processing (OCR), and secure data management practices, ensuring that digitised documents are accurate, accessible, and compliant with research data management and FAIR principles.
Video and audio digitising
Analogue video or audio recordings can be converted into digital sound or video files using a range of hardware and software solutions. Digitisation involves capturing the original analogue signal and converting it into digital data, ensuring high fidelity and preserving the original content as accurately as possible.
For long-term preservation, best practices recommend creating high-quality (often uncompressed or lossless) master files, alongside compressed “access copies” for easier sharing and use. This approach ensures both preservation of the original quality and usability for research and dissemination.
If the actual recordings are not required, and only the spoken content is needed, they can be transcribed and the video or audio data deleted. However, current preservation guidance advises caution: original recordings often contain contextual, acoustic, or visual information that may be valuable for future research, and digitisation is generally preferred over deletion where possible.
In addition, digitised audiovisual materials should be accompanied by appropriate metadata (e.g. format, duration, context, provenance) to ensure they remain findable, accessible, and reusable in line with FAIR principles.
Overall, video and audio digitisation is not only a technical conversion process but also a preservation strategy, requiring careful consideration of quality, formats, metadata, and long-term accessibility.Real-world objects
Those who carry out material structure studies or restoration of museum pieces often work with real-world objects such as biological materials, fabrics, fossils, minerals, and paintings. In Research Data Management (RDM), these physical objects are recognised as research data, and their documentation and preservation are essential for ensuring long-term accessibility and reuse.
Digital technology allows these items to be observed, analysed, and preserved in multiple ways. Advanced digitisation methods, such as 3D scanning, tomography, photogrammetry, and imaging, enable the creation of accurate digital representations (“digital twins”) that support research, conservation, and remote access.
Cell and tissue specimens may exist as intact structures or as sections mounted on slides. In the former case, various scanning methods (e.g. tomography) can generate detailed 3D digital objects, which can then be virtually explored or sectioned. These techniques can reveal internal structures and properties that are not visible through direct observation, enhancing scientific analysis and reproducibility.
For sectioned samples, it is common practice to create high-quality photographic records, as is also done for paintings and similar artefacts. Image-based digitisation is widely used because it is efficient, non-invasive, and suitable for a broad range of materials and research contexts.
Other specimens, such as fossils and minerals, can also be digitised using tomographic or 3D imaging techniques to produce virtual models. In cases where the subject is very large (e.g. buildings or landscapes), more expansive methods such as SONAR, RADAR, or LiDAR can be used. The choice of method depends on the size, material, complexity, and intended use of the object, as no single digitisation approach is suitable for all cases.
If the real-world object is not easy to scan, the only remaining option may be to take high-resolution digital photographs. In such cases, it is essential to ensure that the images are of sufficient quality, accurately represent the object, and are accompanied by appropriate metadata to support future interpretation and reuse.
Overall, digitising real-world objects is not only a technical process but also a strategic one, requiring careful selection of methods, quality standards, and metadata practices to ensure long-term preservation, accessibility, and alignment with FAIR principles.
Advice for managing data in hard copy
-
Appoint an archive administrator and a deputy, and define clear key management duties. Clear roles and responsibilities are essential to ensure accountability, proper handling, and long-term preservation of physical records.
-
Make sure that only the archive administrator and their deputy have access to the physical part of the archive. Access to archives should be restricted to authorised personnel, with appropriate supervision and security controls to prevent loss, damage, or unauthorised use.
-
Make sure that there is a clear procedure in place for archiving and reusing documents. Standardised procedures ensure consistency, traceability, and proper lifecycle management of records, including their retrieval, use, and eventual disposal.
-
A good archive is well organised. Use archive boxes and label them with the project name and number. Show information such as the name of the project leader and a serial number. Archival materials should be stored in appropriate, clearly labelled containers (preferably acid-free and archival quality) to protect them from damage and facilitate retrieval.
-
Create a corresponding digital database and log the most relevant information. Indicate the conditions under which the archive can be accessed and used. Keep a clear record of who has used the archive. Maintaining inventories and metadata records is a core requirement for ensuring traceability, accountability, and future reuse of physical data.
-
Separate anonymised data from informed consent forms, for example by keeping the forms in a locked cabinet. Sensitive and personal data must be stored securely and separately, with controlled access, in line with data protection and ethical requirements.
-
Keep the archive clean and up to date. Make sure that the contents of the physical archive match the records in the corresponding digital and hard-copy databases. Regular monitoring, inventory updates, and alignment between physical and digital records are essential to avoid data loss and ensure integrity.
-
Ensure clear procedures for the removal or destruction of archive materials. Retention and disposal decisions should be documented and follow institutional policies and legal requirements.
-
Records storage areas should be secure, clean, organised, safe, dry, and accessible. Environmental conditions are critical: archives should be stored in stable, cool, and dry environments with controlled temperature and humidity to prevent deterioration, mould, and damage.
-
If possible, storage should include fire protection systems, humidity and temperature control, dust-free conditions, and controlled access mechanisms. Protection against risks such as fire, water, pests, and environmental fluctuations is essential for long-term preservation of physical archives.
References
FAIR, RDM & European Frameworks
- Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. (2016).
The FAIR Guiding Principles for scientific data management and stewardship.
https://doi.org/10.1038/sdata.2016.18 - Science Europe. (2018).
Guidance Document: Presenting a Framework for Discipline-specific Research Data Management.
https://doi.org/10.5281/zenodo.4925907 - Currie, A., & Kilbride, W. (2021).
FAIR Forever? Long Term Data Preservation Roles and Responsibilities.
Zenodo.https://doi.org/10.5281/zenodo.4574234 - van Horik, R. et al. (2025).
FAIR Research Data Management – Part 1 & 2.
https://doi.org/10.5281/zenodo.15310506
https://doi.org/10.5281/zenodo.15310556
Non-Digital Data
- University of Bath Library.
Working with data: Non-digital data
https://library.bath.ac.uk/research-data/working-with-data/non-digital-data - University of Bath Library.
Archiving and sharing data (including non-digital data)
https://library.bath.ac.uk/research-data/archiving-and-sharing/choosing-an-archive - University of Reading.
Where to archive data: Non-digital data
https://www.reading.ac.uk/research-services/research-data-management/preserving-and-sharing-data/where-to-archive-data - Radboud University.
Archiving and publishing research data
https://www.ru.nl/en/staff/researchers/research-data/archiving-and-publishing-data
Digitisation & Data Capture
- Aptara Corp.
10 Advantages of Digitization and Data Capture You Must Know - TU Delft Library.
Preparing research data for publication
https://www.tudelft.nl/en/library/data-management/research-data-management/prepare506-research-data-for-publication - TU Wien.
File formats: preserving and publishing research data
https://www.tuwien.at/en/research/rti-support/research-data/info-and-guidelines/preserving-and-publishing/file-formats
Preservation, Archiving & Storage
- Digital Preservation Coalition
Community Archives Digital Preservation Toolkit
Preservation Policy – Research Data Archive - University of Bath
Archiving and disposing data and information - Röthlisberger, M. et al. (2024).
Data Life Cycle: Preservation.
https://doi.org/10.5281/zenodo.19047429