Dealing with Data - OpenAIRE's First Workshop
On June 11, an OpenAIRE workshop was organized as a pre-Nordbib meeting at the Royal Library in Copenhagen. The workshop was attended by over 80 people and addressed Research Data policy in the context of linking publications to research data, one of the key activity areas in the OpenAIREplus project. In order to support the linking of research output, it is recognized that policies and guidelines have to be in place to support research organisations to manage their research data. It was thus timely that the first public workshop organised by OpenAIRE addressed some of these issues. The workshop was aimed at OpenAIRE participants, library managers, researchers, research funders, and research administrators. The final interactive session enabled participants to talk to the experts, and get to grips with some data management issues.
All talks were recorded and the film clips and slides can be seen hereAfter a short introduction by Najla Rettberg from OpenAIRE, the audience listened to presentations by Oya Rieger, Cornell University, on the Research Data Landscape, Arjan Hogenaar, DANS, on Enhanced Publications. After a break, Data-Literature integration in the Life Sciences was the focus of a talk by Jo McEntyre, EMBL‐EBI, and presentations were given from two policy perspectives; a funder's perspective, Mark Thorley from NERC, and Mary McDerby and her experiences in establishing a data management policy within an Manchester University.
To close this intensive morning session, the group split up into 5 break-out groups, each tackling an OpenAIRE plus related issue: Funders and Data Policy, Institutional Policy, Researchers and Publishers. Participants were also able to attend a more technical group, and a general group for general OpenAIRE project questions.
The first group, looked at institutional policies. It was confirmed by many who attended the group that most institutions don’t have policies at all for data management.
Many institutions however might be at the stage where they would like to know the exact steps for data management policies. Some of these were clarified in the session, such as to be aware of the diverse departments, (for example the administrative CRIS) within the organization and who might be involved in joining the process of writing a policy, so that policies don't conflict with each other. Integrating an academic champion to help to reach out throughout different disciplines is another approach.
Both small, and even large, institutions don’t have the budget to get into a data infrastructure. However it was flagged up that you can develop a plan that guides researchers within an infrastructure, and helps with procedures and advice. This could even be clarifying with researchers what they have to deposit, and what do they have to keep. One approach put forward was to start with a pilot which could grow into a policy.
From OpenAIRE’s point of view, how can we help with data management planning? Well, we can engage in discussions and help with pointing to guidelines for developing policies.
The second group on funding policy tackled the question how can funders mandate good management before, during and after projects?
Some issues discussed were multi-funded projects, and the data produced and subsequent management issues. Additionally, how can data be made open and re-used that might have been created before or after the funded project? This also applies to managing data after the end of project funding. Where does the responsibility lie? This of course should be tackled case by case and in some cases it is up to the institution involved and the institutional repository. It would be good to require open access to data funded 'whole or in part'. Institutions should recognize the the value of locally produced data, audit what is currently available and assign responsibilities in data management and curation. Institutional repositories have a role to play in managing and curating datasets related to experiments and publications, but these datasets should be well documented and reusable. Data centres' role differs as they usually pool datasets from different sources into new databases and provide value added services on top of institutional repositories.
The issue of ‘identifiers’ for researchers (such as ORCID), who in turn acknowledge funding sources. Introducing this is good practice as they can incorporate project information and publications into their data creation.
Interestingly, it was noted that it is cheaper to keep data than to go back and recreate it.
The third group looked at researchers, and if they want to share data and where the barriers lie. Acknowledgment of data sharing is essential. The group discussed peer review and suggested the reviewers best placed for this were the users of the actual datasets. A more open method of review was also discussed.
In terms of sharing data, nervousness of ‘incorrect’ data being exposed was the main barrier. Ways to overcome this were ensuring quality assurance of data at the start, as well as the publisher enforcing standards. Promoting trusted databases to scientists and the value of curated and citable data would also help to lower this barrier.
The fourth group looked more closely at technical issues, and discussed the use of Trust Levels on OpenAIREplus DataModel and Services for entities and relations. It was thought important that OpenAIRE define the requirements and levels of trust for long term preservation of datasets/objects/components and data archives for/of the Enhanced Publications. It was suggested to develop services based on those levels of trust - at user interface, for example, verifying permanence at certain periods of time depending of trust level.
The fifth group for OpenAIRE participants discussed a range of issues and the main observation that some countries simply aren’t developed enough to tackle data issues.
One participant noted that in their institution it would be hard to augment a data policy simply because of issues of ownership and internal sensitivities. A university has so many different departments, so where should the policy sit? In this context the role of the library was discussed and whether that it had the right expertise to manage data. There is a certain inertia to take this on. Above all, researchers need a good ‘usable’ system to share data.
With regard to technical issues, it was noted that while resources are diverse, the model needs to be flexible enough to harvest minimal metadata. The issue was raised about metadata and whether it too was subject to legal restrictions.
In terms of ‘sharing data’ – if scientists know that they have to share it from the start this might be more conducive to good management.
And…the final comment from an attendee was that ‘researchers would rather share their toothbrushes than their data’. We have a lot of work to do!
We would like to thank Nordbib for facilitating and supporting our workshop.