Skip to main content

Firstly, one should be aware that backup is not the same as preservation and/or archiving. Once data reaches a final state, preservation allows easy access to that data, through, for example, a repository. If there are any data, raw or otherwise, and including those that are final products, that are deemed sufficiently important to be securely stored for a long period of time these should be archived - these data can be retrieved but not in the same easily accessible manner offered by preservation.

The preservation of the data may be associated with several reasons: for further analysis or research; for its potential value in terms of re-use; for national/international status and quality; for its originality and/or uniqueness; for its data production costs or innovative nature of the research; its importance for (science's) history; for its relevance of use for non-academic purposes such as cultural heritage or even by funder requirement. By contrast, backing up data is mainly to prevent data loss during the active (analysis) stages of the curation lifecycle. Researchers should do this while working with the data, and repositories do it when they preserve data. We recommend that important data are copied at least three times onto at least two storage media and at least one off-site. Moreover, where available, always use your institution’s managed digital services to allow automated backups. Commercial and non-commercial third-party storage options such as Dropbox are also currently popular, but there is no guarantee that such services will exist in perpetuity while also such options raise questions about ownership.

Here is a checklist to help define a strategy for creating backups:

  • 1. Firstly, find out if your institution has implemented a backup strategy. If so, find out if the backup policy meets your needs. It is also advisable to make direct contact with your IT department if there are any points of contention and also to receive direct advice. Your IT department should be easily contactable and it is in their best interests too to gain knowledge of how their institution’s researchers are using their services. If your institution has no appropriate backup strategy, the following steps should be followed.
  • 2. Identify if there are third party tools that can be used to automate backups. Performing automatic backups will provide a better guarantee that backups are made regularly and that they are stored in the right place, reducing the risk of human errors. Microsoft and Apple have software to support automatic backups. There are also cloud storage solutions (CloudStor; Figshare; SpiderOak; StoneFly) that offer backup functionality. However, it is a good idea to create a routine to check that functional backups have actually been created.
  • 3. Identify the type of backup:
    1. Full system and file backup;
    2. Differential backups, where everything that has changed since the last full backup is recorded. If data recovery is required, you will need the last full backup and the last differential backup;
    3. Incremental backups, where only the last changes since the last backup are saved. To restore your data and/or system, you need the last full backup and the entire series of incremental backups;
    4. Incremental and differential backups greatly speeds up the process as it only updates the changed files, saving time and disk space when running the full backup every day.
  • 4. Plan how many backups and how often they need to be made. It is recommended that you make three backups to minimise the risk of data loss, even if one of your backups is damaged or even lost. If file sizes are very large or if we are dealing with sensitive data, it may be appropriate to work with fewer copies, but your institution’s and/or funder’s requirements should always be consulted wherever possible.
  • 5. Define where the backups will be stored. It is recommended that you do this in physically different locations. Backups can be made to network drives, cloud storage, and local or portable devices. These options will depend on the amount of data that needs to be backed up, their frequency, the level of automation and their sensitivity.
  • 6. Forecast the amount of space that will be required for the backup of data and its documentation, and then estimate the required storage capacity on the actual backup media.
  • 7. You will need to determine how long the backups will be kept and how they will be deleted. It is not recommended that you replace one backup with another. However, if you have to backup large amounts of data frequently, it may not be feasible to keep all backups during the entire project. For sensitive data, you will need to ensure that all data has been successfully deleted and cannot be recovered in any way.
  • 8. Verify that personal data will be protected and that backups containing such information are protected from unauthorised access.
  • 9. Make a disaster recovery plan that defines the steps during and after a loss of data to make quick recovery possible. This plan should include contact persons with responsibility to provide support.
  • 10. It is essential that responsibilities are fully identified regarding the performance of both manual backups and verification of automatic backups, as well as the performance of data recovery tests and restoration of any lost data.

Data backup