Guiding data to its best home: PRIME
The last two decades have seen the development of many different kinds of data repository, resulting in a wide range of choices for researchers. Subject repositories such as Genbank, Dryad and Pangaea offer specialised curation of data in their subject domains, while other subject-neutral services have emerged to serve the long tail of research data, such as the Dataverse Network, figshare and most recently, Zenodo.
At the same time, higher education institutions have begun experimenting with holding data as well as preprints. There are strong drivers to do so – being able to showcase all outputs in one place is both important for competition and increasingly for obtaining funding. This is not proceeding quickly, as universities struggle with a range of obstacles such as how to modify existing infrastructure, how to retrain library staff in data archiving, and how to potentially cope with specialised data from all disciplines represented on campus.
The PRIME (Publisher, Repository and Institutional Metadata Exchange) project seeks to provide at least a partial solution to these problems. A joint project between University College London (UCL), the Archaeology Data Service (ADS) and Ubiquity Press, and funded by Jisc, PRIME is piloting a system to exchange metadata between institutional repositories, subject repositories and publishers. The aim is to enable an ideal situation where a research dataset is held in the repository best suited to its curation and long-term archiving, while other repositories for which the dataset is relevant receive metadata records that describe it and point back to its location in the main archive. Data can therefore reside in an appropriate subject repository, while its creator’s institution automatically has a record of it as well, without the responsibility and overhead of caring for it.
In order to incentivise data archiving and to give subject-specific guidance, the project is also integrating a data journal. Focusing on archaeology data for the pilot, PRIME involves the integration of the Journal of Open Archaeology Data (JOAD) with the ADS subject repository and the UCL Discovery institutional repository. The metadata exchanged between the respective systems is a subset of the DataCite metadata profile, and this is two be transferred via two mechanisms. The first utilises the Symplectic Elements system, through which UCL harvests the majority of its staff’s publications. The second will involve a plugin for the EPrints repository system to enable researchers at universities not using Symplectic to pull metadata.
A typical use case is illustrated by the above diagram. A researcher comes to the Journal of Open Archaeology Data and submits a data paper for publication. The journal requests that the data be openly archived in a suitable repository as a precondition for publication. The author then chooses the ADS, and deposits their data there. A metadata record about this data is then made available to UCL Discovery, which creates a public record associated with its staff member, and with a DOI that points back to the ADS, where the data is actually located. The data paper also points directly to the data in the ADS.
The PRIME pilot is still in its early stages, but progress reports will be available on the project website at http://www.ucl.ac.uk/prime.
Brian Hole, Ubiquity Press