Open Call Winner Phase 1 - Data Futures Limited
OpenAIRE Person Name Resolution Service: Biodiversity
The biodiversity community is advanced, compared with other sectors, in transforming existing scholarly literature into citable and reusable data based on text contained therein, such as taxonomic treatments or observation records. These data currently contain person names in various capacities, such as author, collector, or identifier. However, names are not unique and often inconsistently formatted with unstructured roles, though taken together they allow creation of a profile and canonical name for a group of person instances.
The BD-CPE resolution service allows us to return the respective canonical name, together with confidence metrics and profile information, and also to get all the variations of the name to expand searches. Person instances in the data can then be annotated with the respective canonical name, which allows much better linking with other biodiversity resources.
Tender priority topics addressed: The present proposal addresses OpenAIRE Challenge Topic #3: "Expand the OpenAIRE Service portfolio focusing;
A) on the enhancement of current OpenAIRE services or
B) on the creation of new OpenAIRE services".
The consortium builds on a major existing development by Plazi; automatically extracting key data from existing publications and making it open and FAIR via Zenodo for the purpose of understanding species loss and contributing to creation of a cyber-catalogue of the world's species. Publications currently managed by publishers are very often not open, and this represented a crisis for efforts to aggregate recorded species in the face of mass extinction and increasing challenge in the field (see https://www.cbd.int/gti/problem.shtml).
The development of the Plazi workflow, including extraction tools for embedded scientific information, led to large-scale creation of accessible taxonomic treatments. However, these currently privilege taxonomic data and there is not currently functionality to normalize persons' names upon entry into Zenodo—instead, persons potentially appear in multiple metadata elements with unstructured roles.
The specific contribution of this proposal to the OpenAIRE Challenge Topic is that we will definitively address this problem within the biodiversity community by a) developing software to create a new OpenAIRE canonical person entity resolution service and b) enhancing all existing taxon treatments in BLR on Zenodo to include canonical person entity ascriptions. This will both greatly enrich these deposits; enabling rapid disambiguation and grouping of collection and identification activities, and it will significantly advance integration with other key biodiversity data resources such as GBIF—linking much more biodiversity data automatically and eliminating manual searching.
We will subsequently integrate canonical person entity functionality for biodiversity with Zenodo itself. This expended service for biodiversity, will precipitate much-needed broader community action in respect of developing persistent identifier strategies for historic scientific contributors, enabling automated discovery and efficient reuse of existing publications in fields such as bioinformatics and information technology.
Phase 1 budget: 7,800 €
The Data Futures project is a consortium of institutions and publishers including Aix-Marseille, Basel, Heidelberg, Lyon and Princeton universities and Merve Verlag GmbH. It is based in the Institute for Modern and Contemporary Culture at the University of Westminster, London.
Established at the beginning of 2012 to focus on factors affecting long-term accessibility of research data, and in particular on growing recognition of the need to capture broader research investment than is possible using 'core' metadata standards, the project comprises a multi-disciplinary team of software engineers and subject-based theorists.
Data Futures freizo migration platform enables digital collections to be made portable and re-delivered using contemporary technologies instead of accreting maintenance liabilities and ultimately risking loss as funding priorities change and institutions mutate. Its services to third parties include assessment of long-term vulnerabilities of research data, transformation of projects to portable formats and change management to address the wider horizon of institutional logic and other dispositives affecting sustainability.