Skip to main content

News 

Published
Apr 9, 2018
Author

Follow Us

OpenAIRE Interoperability workshop: Lessons learned

Apr 9, 2018
'Guimaraes' Image courtesy of Ata Türkfidani120 delegates gathered in Minho on February 7/8 to attend a workshop on Interoperability. The purpose of the workshop was to showcase the new set of OpenAIRE guidelines and to discuss and get input onto OpenAIRE’s approach to harvesting metadata.

This was in addition to the wider aim of discussing the challenging issue of interoperability within the open access world and between repositories. We invited therefore representatives of a range of communities to present on how they see themselves fitting into the knowledge landscape: CRIS, DataCite, publishers, data repositories, EUDAT, research infrastructures, and ultimately, from an OpenAIRE point of view, how we can potentially work together with them. 

The breakout groups were a chance to see how the guidelines reflected on other initiatives, and get input into issues such as grant ID encoding, licensing considerations, and some decisions were made such as who and how to harvest in terms of data.

The 2-day event also started off with an open science session, including delegates from the Open Knowledge Foundation and Geoffrey Boulton, who gave us a wider perspective than the research institution on this challenging debate.

All talks were recorded. They and the presentations can be found here: http://www.openaire.eu/pt/component/content/article/9-news-events/444-openaire-interoperability-workshop-presentations-a-recordings-online

openaire interoperability minho

Lessons learned

Definition of Interoperability

“Interoperability is the ability of systems to communicate with each other and transfer information back and forth in a usable format” (COAR, 2012)

Why did we hold such a workshop?

OpenAIRE is starting to move from a publication infrastructure to a more comprehensive infrastructure that covers all types of scientific output. Therefore it makes sense to hold a workshop like this and discuss with all relevant sources of this output (data repositories, EUDAT, CRIS, publishers) how we can work together and share each other’s information. By meetings such as these we can drive it forward, and expand infrastructures such as OpenAIRE to benefit from the wealth of valuable information.

But why is interoperability such a challenge?

Because we have to decide on common protocols, accession numbers, metadata standards etc. so that each of these different sets of information can be exchanged. Even repository to repository communication is complicated, now factor in other types of information, such as linking together datasets and pdfs.

And, because many people outside our domain don’t understand its full weight. We have to communicate just how important interoperability is. It won’t make sense to many people outside our community, but it is crucial to understanding how we can gather together many different research outputs. This ties into the wider issue of harmonizing policies for open access at both national and EU level. There are many complexities, even at national level, to have an overview of who is responsible for open access and open science.

Summary
  • We received positive feedback on OpenAIRE’s guidelines from the breakout sessions, and we can move to finalize them by the end of Feb.
  • DataCite metadata kernel was the right approach, but we must consider that we have to clear about ‘relationships’ between a publication and a dataset. Does this refer to author of the dataset, contributor etc.
  • We have some concrete actions to take forward in terms of harvesting data. OpenAIRE will make a start with harvesting BADC’s NERC funded subset.
  • Revised OpenAIRE Literature-repository guidelines: current draft to be finalized in March.
  • Ensure that all guidelines highlight the broader scope of OpenAIRE in terms of EC resources. This will lead to any global applicability.
  • A roadmap for the CERIF-XML profile was put in place. This will be expanded and presented on 13-14 May in Bonn.
alt'Post-prandial fun' Image courtesy of Ata Türkfidani

OpenAIRE Interoperability workshop (see further down for observations on Day 1, Open Science)

Day 2 – 8 February, 2013

Let’s remind ourselves why we are all here. A key milestone in OpenAIRE is the guidelines. How repositories (data and publications) and CRIS systems can interoperate and link to OpenAIRE. So it made sense to get some representatives of these groups to talk about their overall plans for interoperability and also to comment on our guidelines.

The OpenAIRE guidelines were presented on day 1. These will take a three-tier approach, three sets of guidelines for literature archives, data repositories and CRIS systems. Of particular interest at the workshop were the data repository guidelines which will guide OpenAIRE’s data providers to expose their minimum metadata set (via DataCite’s metadata kernel) to be harvested and linked to open access publications. Currently undergoing review, these will soon be published and distributed.

CRIS: Ed Simmons of CRIS: reminded us that a researcher will work for many projects in many different time frames and in different roles and capacities and leadership roles. The wonder of CERIF is that it can specify all these aspects. This has repercussions on collecting publications and adding value by tracking and defining their relationships. This in turn can attract talented young scientists and can use this as a window to broadcast themselves. A formal structure within an informal one. That makes common sense.

>> Interoperability with OpenAIRE: creation of a dedicated CERIF-XML OpenAIRE profile. A joint working group will take this forward to work on this profile.

Research Infrastructure: ENGAGE was next up. This ambitious research infrastructure concentrates on public sector information (think: demographic information, statistics, public safety etc). The project looks at metadata specification and interoperability. The range of partners is wide, academic and commercial. Academics will certainly use these datasets. And the wider public, again proving that openness is economic gain, and consequently government departments can start to publish in an appropriate way. The curation of government datasets will be done by crowd sourcing. Nikos reinforced the idea of improving datasets, creating new versions, more curated, linking to the old one. By using CERIF, one can have a URI for each entity. ENGAGE also take a 3-tier approach to metadata, with discoverability at the top.

>> Interoperability with OpenAIRE: none of this data should end up in a silo. Much of it is open access and should have interoperable standards to make it open.

Publisher: Then onto Brian Hole of PRIME: this takes a holistic look at all these repositories, in this case archaeology ones. How can we get them to talk to each other and make them more discoverable? Working closely with the research community, to help them discover their data. Brian gave us some useful case studies. With the overall plan that the researcher only has to submit the data/metadata once.

>> Interoperability with OpenAIRE: Ubiquity press works with over 30 data repositories, including Dryad. Many datasets are related to open access publications.

The second session looked more closely at research data repositories

DataCite: after an overview of what datacite does, Herbert from DataCite raised the issue of granularity of datasets. This will be a challenge for all infrastructures. For discoverability, Datacite’s metadata scheme makes perfect sense.

>> Interoperability with OpenAIRE: It goes without saying: we mint our own DOIs and are reliant on this service.

EUDAT: EUDAT’s joint metadata domain will have consequences for interoperability, but each of these Research Infrastructures such as CLARIN focus on different subject needs. To answer this challenge, the initiative is developing automatic adaptors for specific metadata schemas. So EUDAT works on community oriented services, such as data upload, preservation, as well as the ‘oil’ in the works: PIs, AAI. It also makes no assumptions about the level of granularity.

>> Interoperability with OpenAIRE: working together on high-level discoverability and common standards for dataset metadata.

Data Repository: Sarah Callaghan of BADC also spoke about the Preparde project. Journal workflows are varied, but the journal also needs to let the repository know about the citation. Minimizing the effort to submit a data paper

>> Interoperability with OpenAIRE: possible use case here whereby OpenAIRE could collect usage citations from data journals and feed them back to the corresponding archives.

Data Repository: Katarina Boland from GESIS outlined the efforts of this large social science repository to link publications to data. 

>> Interoperability with OpenAIRE: learning lessons from how they link publications to metadata. We could take a look at their algorithms, consider exporting the links.

The afternoon breakout groups were a chance to see how the guidelines reflected on other initiatives, and get input into issues such as grant ID encoding, licensing considerations, and some decisions were made such as who and how to harvest in terms of data. For more details on the breakout sessions, feel free to get in touch.

altWith thanks to our Minho colleagues! Image courtesy of Ata Türkfidani


Day 1 – 7 February, 2013  
Some observations on the open access and open science seminar

Bernard Rentier inspired us all by stressing the success of the ‚enforced‘ mandate approach. Remember: no head of enterprise ignores what the business makes. In the case of universities this is knowledge, which needs to harnessed and made open. Ok, you can’t force or impose deposit on academic staff but with a top down approach the carrots can play a key factor. This means: user guides, contextual help, pre-import tools, legal help. I asked Bernard about their user guides: these are an iterative process carefully designed and blind tested on staff for usability.
Geoffrey Boulton stirred the waters in his questions, by stressing that the university has no right to impose this top down approach. That the ‘unit of production’ per se is the scientific community to which they belong. That all this output should intrinsically belong and tie in with the international community. Retort from Prof Rentier: Liege and the role of the IR is to bring to the surface the product so at least everyone can find it. It must be placed in the IR, otherwise it will get lost.

Alma Swan followed, reminding us that while the EC only funds 10% of research in Europe it has a duty to set the standard and produce a consistent OA picture. Coordination throughout Europe is key. Which means policy alignment at interdisciplinary level too. And the regrettable thing about the Finch approach is that above all it ignores all other EC policies, hence any hope for harmonisation. She reminded us that the big deal will still lurk in an OA/gold environment, giving us some scary figures for bulk converting articles to gold OA: a true incentive for libraries to accept bundled deals. But Alma reassured us: the future academic. These people at the beginning of their careers have a different set of expectations about the open web, and grass-roots activism such as R2RC, will change just how open the knowledge society should be. In a good way!

A session on open science was kicked off by Geoffrey Boulton of the Royal Society. Reinforcing his earlier comment on the importance of bottom-up vs top-down, he said that what is crucial for top down is that they shouldn’t determine inflexible ways. Bottom up with drive innovation. And of course, we know open data is crucial, and open access publishing is just one element of it. However, we are walking into a crisis. Too much science now is published in a way that ought to be unacceptable! Remember: the only way that data has a value is when it is communicated and above all intelligent. The EC should grab the idea of intelligent openness!

Jenny Molloy of the OKFN urged us to remember that cultural boundaries are moving forward with open data. communities are building supports for open data. the power of information should lie in public hands, not in a small elite. Oh and by the way, the PDF is pretty useless in terms of use and reuse.

Interesting post talk chat: The decentralized nature of libraries are missing the connection to research. Research libraries need to change fundamentally. The function of the library has remained the same, but how it discharges the function is not optimal. It needs to tackle the complexities of the data rich world and the real needs of scientists.

Next session: OpenAIRE. After an overview and vision of future services, we heard about the wider role of research e-infrastructures. There isn’t one that will fit all, they all have to work together in order to deliver a common set of uniform services. OpenAIRE’s contribution to this landscape: communication and services to science.