Curating the OpenAIRE Graph: disambiguating Greek organisations metadata with OpenOrgs
Correcting data from Greek organisations inferred in OpenAIRE Graph and more services
Overview
Challenge & Scenario
Solution & Implementation
Impact
Related resources
In depth description
Details
In February 2022, the NKUA was developing a MONITOR institutional dashboard to track Open Science using OpenAIRE data from the Graph. NKUA was found many times in OpenOrgs causing delays in the deduplication of the organization and in the proper linking with other data about the organization in the institutional dashboard. The OpenOrgs National Admin observed that the list of suggested organisations to be disambiguated for Greece was never-ending and contacted the service manager. They investigated the issues by using the NKUA example and found 35+ individual records in the English and Greek language: National Kapodistrian University of Athens | Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών.
The algorithm needed to be refined to better recognize the Greek alphabet and besides the Greek alphabet to prevent new records to be created because the title breaks on the "&" character used in between the title sometimes.
The image below captures a snapshot of the curation process, showing the different duplicates available for approval / deletion.
This is the complete, curated and merged NKUA record in OpenOrgs:
The provenance of every curation performed is kept and you can see how the one organisation looks like after in OpenAIRE EXPLORE:
It should be noted that the disambiguation process is a continuous activity. Every time a new source is harvested by OpenAIRE, there is a chance that new records of existing organisations will occur and duplicates will emerge for National Admins to curate.