Linknovate knowledge base enrichment and curation via OpenAIRE Graph
Leveraging the OpenAIRE Graph to eliminate duplicated profiles and enhance organization information within the Linknovate platform
Overview
Challenge & Scenario
Solution & Implementation
Impact
Related resources
In depth description
Details
In this project, the Linknovate team analyzed the OpenAIRE Graph to explore additional data related to organizations involved in the research life-cycle, such as universities, research organizations, and funders. The primary focus was on addressing the challenge of duplicate entries for organizations within the Linknovate platform.
The first major task involved cleaning up and consolidating organization profiles. By comparing official names and alternative names from the OpenAIRE dataset, the team identified and merged duplicated profiles. They also added these alternative names as aliases to ensure accurate association with future records.
Next, Linknovate worked on enriching organization profiles in three key aspects:
- Website Information: Some profiles in the Linknovate platform lacked valid website addresses. To address this, the team retrieved this information from the OpenAIRE dataset, enhancing the completeness of these profiles and refining organization descriptions (also very important in different features of the platform).
- Location Information: Leveraging the geographic data from OpenAIRE, the team complemented the organization profiles with country-level information. This was particularly valuable for accurate location-based searches, addressing challenges with potentially inaccurate city-level details, especially for large organizations.
- Organization IDs: Recognizing the importance of organization IDs for future actions and data integration, the team compiled various standard IDs (e.g., ROR, GRID) from the OpenAIRE dataset for the 88,954 organizations matched between both datasets.
In essence, the project aimed to improve the overall quality, completeness, and accuracy of organization profiles within Linknovate platform by leveraging the rich data available in the OpenAIRE dataset. This effort not only addressed deduplication challenges but also enhanced the platform's capability to provide more precise recommendations for similar organizations.
Finally, Linknovate concludes with a clearer understanding of the OpenAIRE Graph and its potential use in the future. Although certain limitations prevented further progress at the moment, we hope that in the future it may eventually offer the opportunity to expand Linknovate's publications dataset.