Skip to main content
4 minutes reading time (869 words)

OpenAIRE Tech Talk Roundup: Breaking Down the Bits and Bytes. And beyond.

Technical coordination workshop 2024- Our team

The OpenAIRE technical team held its annual meeting in Pisa on 12-14 April 2024. As CEO of OpenAIRE and driving this team with Paolo Manghi for 10+ years, I was thrilled to see so many brilliant and young minds in one place. All with a passion to improve OpenAIRE services and make them more useful to research communities. These meetings are always a reminder of the incredible journey that technology has taken us on thus far, as well as the even more exciting path that lies ahead. They are truly inspiring.

Inside the meeting- Innovation, Collaboration, and Cuisine in Tuscany: For two and a half days, our team delved deep into the specifics of our services and projects, all while pushing the boundaries of conventional thinking. Our workshop agenda was packed with challenging topics and sparked innovative ideas, despite the technical complexity of some discussions. The primary focus was on Europe and the European Open Science Cloud (EOSC), but we also engaged in lively debates about global developments and the increasingly significant role of the OpenAIRE Graph in the evolving global open infrastructure ecosystem.

True to our collaborative spirit, and thanks to CNR-ISTI for hosting our energetic group, our often heated discussions ended with a positive mindset, … with a bit of help from some delightful Tuscan wine and cuisine. By the end of the meeting, we had laid the groundwork for our immediate actions and crafted a vision for future initiatives.

OpenAIRE Graph in the European Open Science landscape: Of course the team's focus was on defining the roadmap for integrating research communities' needs into the OpenAIRE Graph. Detailed discussions were held on data and indicators, assessing their quality through comparisons and benchmarking, highlighting the effect of recent enhancements such as the improved coverage of affiliation links, and planning for future improvements. The session culminated with a set of tailored recommendations, leading to updates and refinements in the Graph roadmap that accommodate the needs and feedback of these projects.

As our work is very much aligned with Horizon Europe Projects, we specifically addressed the status and data requirements from the PathOS project (computing indicators for the impact of Open Science), the National Open Access Monitor for Ireland (how to improve the author disambiguation and affiliation), the GraspOS project (to provide a dataspace and an EOSC catalogue for Responsible Research Assessment), the CRAFT-OA project (building dashboards for diamond OA publishers ensuring all indicators are accounted for), and the UKRN Open Research Indicators programme, which requires additional mining capacity.

Evolving Our Graph APIs: Enhancing Access and Usability for Data Consumers: Navigating the complexities of Graph APIs has long been a challenge for our data consumers, especially those requiring bulk access for their analytical needs. We've heard your feedback loud and clear and we've already started transitioning to our new Graph APIs to improve usability. Our discussions focused on new functionalities specifically requested by our users and on how we can engage more effectively and respond more swiftly to user needs. One of the critical aspects we're enhancing is how we expose citation data and tools, such as BIPFinder, through the Graph API. These tools are vital for researchers and analysts, and making them more accessible through the Graph API has been a top priority.

Looking ahead, we are also set to explore potential changes in our backend technologies. We're conducting tests to see if OpenSearch could be a better fit than Elastic for powering our ScholExplorer functionalities. This is part of our broader effort to ensure that our systems are not only robust but also state-of-the-art, capable of delivering the best possible service to our community.

Updated OpenAIRE Graph Acquisition policies: As new standards emerge and technology advances, we must update our data acquisition policies, which govern how we collect and store data (or metadata). As the OpenAIRE Graph is now used for discovery, bibliometrics, and monitoring, and many repositories and data sources are looking to transition to the new era (OpenAIRE Guidelines 4.0), the team has revisited these policies to ensure high-quality data for the graph while also lowering maintenance costs. And, with the introduction of research data, research software, and other products, the discussion centred on revising the meta-classification of research products to address the specific class of products for dissemination and educational resources.

Technology updates: OpenAIRE runs a costly and complex infrastructure in terms of computing and storage. The upcoming deployment of Spark 3.4 on the main cluster marks a step towards enhancing performance and efficiency. Discussions have centred on strategies to reduce memory usage, thereby boosting computational capabilities and promoting environmental sustainability. The transition involves replacing Oozie and Cloudera Manager with Kubernetes and Airflow to modernise and improve system efficiency. Additionally, there's a shift from using the Hadoop Distributed File System (HDFS) to Amazon Simple Storage Service (S3) for data management, with plans in place to ensure feasibility. Efforts are also underway to alleviate workload on Impala (where we run complex queries for our MONITOR dashboards) by handling resource-intensive queries with an updated version of Spark and developing a new incremental citation matching strategy to avoid redundant data processing. An Infrastructure Monitoring System has also been identified as necessary for the fine-grained management and optimization of Spark jobs.

Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.