Aggregation and content provision workflows
OpenAIRE materializes an open, participatory research graph (the OpenAIRE Research graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE research graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1]
Learn more at https://graph.openaire.eu.
What does OpenAIRE collect?
The OpenAIRE technical infrastructure collects information about objects of the research life-cycle compliant to the OpenAIRE acquisition policy [5] from different types of data sources [2]:- Scientific literature metadata and full-texts from institutional and thematic repositories, CRIS (Common Research Information Systems), Open Access journals and publishers;
- Dataset metadata from data repositories and data journals;
- Scientific literature, data and software metadata from Zenodo;
- Metadata about data sources, organizations, projects, and funding programs from entity registries, i.e. authoritative sources such as CORDA and other funder databases for projects, OpenDOAR for publication repositories, re3data for data repositories, DOAJ for Open Access journals;
- Metadata of open source research software from software repositories and SoftwareHeritge
- Metadata about other types of research products, like workflow, protocols, methods, research packages
What kind of data sources are in OpenAIRE?
Objects and relationships in the OpenAIRE Research Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds:- Institutional or thematic repositories: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC);
- Open Access Publishers and journals: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles;
- Data archives: Information systems where scientists deposit descriptive metadata and files about their research data (also known as scientific data, datasets, etc.).;
- Hybrid repositories/archives: information systems where scientists deposit metadata and file of any kind of scientific products, incuding scientific literature, research data and research software (e.g. Zenodo)
- Aggregator services: Information systems that collect descriptive metadata about publications or datasets from multiple sources in order to enable cross-data source discovery of given research products. Examples are DataCite, BASE, DOAJ;
- Entity Registries: Information systems created with the intent of maintaining authoritative registries of given entities in the scholarly communication, such as OpenDOAR for the institutional repositories, re3data for the data repositories, CORDA and other funder databases for projects and funding information;
- CRIS : Information systems adopted by research and academic organizations to keep track of their research administration records and relative results; examples of CRIS content are articles or datasets funded by projects, their principal investigators, facilities acquired thanks to funding, etc..
- Research Graphs: services that maintain an information space of (possibly interlinked) scholalrly communication objects. Examples are CrossRef, ScholeXplorer and OpenAIRE itself.
How does OpenAIRE collect metadata records?
OpenAIRE collects metadata records describing objects of the research life-cycle from content providers compliant to the OpenAIRE guidelines and from entity registries (i.e. data sources offering authoritative lists of entities, like OpenDOAR, re3data, DOAJ, and funder databases).
The OpenAIRE aggregator collects metadata records in the majority of cases via OAI-PMH, but also supports other standard exchange protocols like FTP(S), SFTP, and RESTful API.
After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph that you can access from the OpenAIRE portal and the APIs.
For additional details about the aggregation workflows, please refer to [7].
What does OpenAIRE do to enrich the collected metadata records?
Once the Research graph is populated, OpenAIRE performs de-duplication of organizations and publications [8] and runs inference algorithms [3] to enrich the graph with additional information extracted from the publications' full-texts, namely:- Subjects
- Links to datasets
- Links to projects
- Links to research communities and infrastructures
- Links to publications (i.e. similar publications)
- Links to software
- Links to biological entities (e.g. PDB)
- Citations
How is the enriched OpenAIRE Research Graph published?
Once materialised and enriched the graph is made available to all OpenAIRE portals (EXPLORE, MONITOR, and CONNECT gateways) and APIs for programmatic access.
Every 6 months a full dump in json is also published on Zenodo.
Details about the APIs and the dumps can be found at https://develop.openaire.eu. [9][10]
Before publishing the updated graph, the OpenAIRE team performs a set f semi-automatic checks for quality control [11].
Those quality check are needed to evaluate whether the switch to public can be performed or some regressions in the overall data quality need to be addressed first.
How often is the OpenAIRE Research Graph published?
The OpenAIRE Research graph is published at least once every 2 months.
From one version to another, the technical team performs quality checks on the data and on the produced statistics for MONITOR and the Open Science Observatory.
Whenever minor issues occur, the is published anyway and details about the issues are
- tracked via the private ticketing system of the OpenAIRE technical team
- if the issue depends on the original collected content, it is notified to the affected data source
- briefly described in the table below, which keeps track of the index and statistics update
References
[1] Manghi P. et al. (2014) "The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures", Program, Vol. 48 Issue: 4, pp.322-354, https://doi.org/10.1108/PROG-08-2013-0045
[2] Check the data provider page (https://explore.openaire.eu/search/find/dataproviders) for the complete list of sources
[3] Bolikowski L. (2015) Text mining services in OpenAIRE: https://blogs.openaire.eu/?p=88
[4] OpenAIRE claiming functionality: https://explore.openaire.eu/participate/claim
[5] The OpenAIRE content acquisition policy
[6] Check which funders are affiliated with OpenAIRE: https://www.openaire.eu/search/find#projects
[7] Atzori, Claudio, Bardi, Alessia, Manghi, Paolo, & Mannocci, Andrea. (2017). The OpenAIRE workflows for data management. Zenodo. http://doi.org/10.5281/zenodo.996006
[8] Manghi P. (2015) On de-duplication in the OpenAIRE infrastructure: https://blogs.openaire.eu/?p=116
[9] OpenAIRE API documentation: http://develop.openaire.eu
[10] OpenAIRE Linked Open Data: http://lod.openaire.eu/documentation
[11] Mannocci, A., & Manghi, P. (2016, September). DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures. In International Conference on Theory and Practice of Digital Libraries (pp. 357-369). Springer International Publishing. https://doi.org/10.1007/978-3-319-43997-6_28
Index and stats update
“This webpage is deprecated and will not be updated in the future. If you are interested to learn more about the OpenAIRE Graph production workflow you can visit our documentation website here. In addition, you can find more information for the various releases of the Graph here.”
Available on the portal |
Start date |
Notes |
2023-01-30 | 2023-01-16 |
Content updated on the explore and monitor portals. News: |
2022-12-28 | 2022-12-19 |
Content updated on the explore and monitor portals. News: |
2022-11-29 | 2022-11-18 |
Content updated on the explore portal. News: |
2022-11-07 | 2022-10-17 |
Content updated on the portals. |
2022-09-26 | 2022-09-09 |
Content updated on the portals. |
2022-08-08 | 2022-08-13 |
Content updated on the portals. News: |
2022-06-20 | 2022-07-28 |
Content updated on the portals. News: |
2022-06-21 | 2022-06-10 |
Content updated on the portals. News: |
2022-05-25 | 2022-05-10 |
Content updated on the portals. Updated statistics for MONITOR are being checked and not yet published |
2022-05-01 | 2022-04-06 |
Content updated on the portals. Publishing delayed due to content quality checks. News: |
2022-03-18 | 2022-02-25 |
Content updated on the portals. Publishing delayed due to content quality checks. News: |
2022-02-09 | 2022-01-16 |
Content updated on the portals. News: |
2021-12-14 | 2021-11-25 | Content updated on the portals. |
2021-11-03 | 2021-10-12 | Content updated on the portals. |
2021-09-27 | 2021-09-20 | Content updated on the portals. |
2021-08-14 | 2021-08-09 |
Content updated on the portals. |
2021-07-28 | 2021-07-14 | Content updated on the portals. - Introduced curated organizations via OpenOrgs - Integrated FCT projects via the new official database PTCRIS; full-text mining identified more than 50K research products (vs previous 40K) funded by FCT; the alignment to official grant codes caused a temporary decrease of the links between products and old FCT projects collected from repositories |
2021-07-09 |
2021-06-22 |
Content updated on the portals. We experienced a delay due to an unexpected temporarily unavailability of the index server. |
2021-05-28 | 2021-05-23 |
Content updated on the portals. Statistics on MONITOR updatd on May 29th. Generated events for the Broker: check them out on PROVIDE: https://provide.openaire.eu and subscribe, if you have not done it yet! |
2021-05-10 | 2021-05-01 |
Content updated on the portals - FCT publications fully recovered |
2021-04-20 | 2021-04-12 | |
2021-03-29 | 2021-03-20 | |
2021-03-03 | 2021-02-22 |
Content updated on the EXPLORE and the MONITOR portals. - Refresh collection of contents from JAIRO decreased the relative records by 1M. - FCT publications partially recovered. |
2021-02-01 | 2021-02-08 |
Content updated on the EXPLORE and the MONITOR portals. - MZOS and NWO publications back on track. - FCT publications decreased, Repositorium of UMinho had changed the compliance level from OpenAIRE Literature 3.0 to OpenAIRE Institutional and Thematic Repositories 4.0 (Literature 4.0). |
2021-01-11 | 2021-01-19 |
Content updated on the EXPLORE and the MONITOR portals. - MZOS publications decreased and we are re-processing "Full-text Institutional Repository of the Ruđer Bošković Institute" to solve this in the next run - NWO publications decreased after an aggregation in REFRESH from NARCIS. |
2020-12-17 | 2020-12-27 | Content updated on the EXPLORE and the MONITOR portals. |
2020-12-04 | 2020-11-25 | Content updated on the EXPLORE portal. |
2020-10-30 | 2020-10-21 | Content updated on the EXPLORE portal. |
2020-10-01 | 2020-09-24 | |
2020-08-26 | 2020-08-13 |
Content updated on the EXPLORE portal. Records from NARCIS are temporary out of the OpenAIRE Research Graph due to technical aggregation issues and we'll be re-introduced in the next content update. |
2020-08-06 | 2020-07-08 |
Happy Summer! EXPLORE is showing the whole OpenAIRE Research Graph with more than 130 Millions de-duplicated metadata records, including: - Records from the network of OpenAIRE compliant repositories - DOIBoost (Crossref, Unpaywall, Microsoft Academic Graph, ORCID) - Links from ScholeXplorer |
2020-06-15 | Statistics on monitor and explore portals updated | |
2020-06-06 | 2020-06-04 | Content on the explore portal updated. |
2020-05-19 | 2020-05-11 | Content on the explore portal updated. |
2020-04-28 | 2020-04-24 | Content on the explore portal updated. Statistics have not been updated. |
2020-04-08 | 2020-03-30 | Content on the explore portal updated. Statistics updated. |
2020-03-10 | 2020-03-05 |
Content on the explore portal updated. |
2020-02-26 | 2020-02-11 |
Content on the explore portal updated. |
2020-01-30 | 2020-01-18 | Content on the explore portal updated. Statistics have not been updated. |
2020-01-13 | Statistics updated | |
2019-12-29 | 2019-12-20 | Content on the explore portal updated. Statistics have not been updated. |
N/A | 2019-12-02 |
Update skipped to give higher priority to the update of the pre-production release of the OpenAIRE Research Graph available on BETA Explore. Read our blog post to know more |
2019-11-14 | 2019-11-13 |
Content on the explore portal updated. Statistics and content available via OAI-PMH have not been updated. |
N/A | 2019-10-28 |
Update skipped to give higher priority to the pre-production release of the OpenAIRE Research Graph available on BETA Explore. Read our blog post to know more |
2019-10-17 | Updated content available via OAI-PMH | |
2019-10-15 | 2019-09-30 |
Content on the explore portal updated. |
2019-09-09 | 2019-09-06 | Content available via the explore portal and OAI-PMH updated. Statistics have not been updated |
Statistics updated in August 2019 | ||
2019-07-30 | 2019-07-24 | Content available via the explore portal and OAI-PMH updated. Statistics have not been updated |
2019-07-15 | 2019-07-10 |
Content available via the explore portal and OAI-PMH updated. |
2019-06-23 | 2019-06-19 |
Content available via the explore portal and OAI-PMH updated. |
N/A | 2019-06-03 |
Records from Arxiv.org re-harvested. De-duplication and inference algorithms are running (info added on 2019-06-06). Content could not be published because of some quality issues. |
N/A | 2019-05-23 |
Content not published due to a loss of metadata records from Arxiv.org. |
2019-05-16 | N/A |
Updated statistics on monitor.openaire.eu |
2019-05-14 | 2019-05-06 |
Content available via the explore portal and OAI-PMH updated. Main change: upgraded Solr server in use since May 22nd 2019. |
2019-04-08 | 2019-04-03 |
Content available via the explore portal and OAI-PMH updated. |
2019-03-28 | 2019-03-11 |
Content available via the explore portal and OAI-PMH updated. |
2019-02-28 | 2019-02-20 |
Statistics and content available via the explore portal and OAI-PMH updated. |
2019-02-11 | 2019-01-28 |
Statistics and content available via the explore portal and OAI-PMH updated. |
2019-01-04 | 2018-12-27 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-12-13 | 2018-12-04 |
Statistics and content available via the explore portal and OAI-PMH updated. All types of research products have been de-duplicated. |
2018-11-19 | 2018-11-12 |
Statistics and content available via the explore portal and OAI-PMH updated. Content from Portuguese repositories re-aggregated Inference and de-duplication algorithms have been re-run to solve the issues about lost links. As a consequence of the new algorithm run and of the increase of available full-texts, we note a general increase of links to projects of all funders. |
N/A | 2018-10-30 |
Content generated cannot be published as we noticed
|
2018-10-16 | 2018-10-10 |
Statistics and content available via the explore portal and OAI-PMH updated. Main change: updated mapping for new research object types |
2018-10-10 | 2018-10-01 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-09-10 | 2018-08-28 |
Statistics and content available via the explore portal and OAI-PMH updated. The harmonisation of SNSF publication metadata is still ongoing. |
2018-08-01 | 2018-07-27 |
Statistics and content available via the explore portal and OAI-PMH updated. We noticed a decrease of SNSF publications due to a change in the resource types in the records collected from the SNSF P3 publication database. |
2018-07-10 | 2018-06-27 |
Statistics and content available via the explore portal and OAI-PMH updated. Main change: updated version of OpenAIRE mining algorithms processed more than 400K additional full-texts from Springer Open Access. |
2018-06-08 | 2018-06-05 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-05-28 | 2018-05-22 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-05-15 | 2018-05-08 |
Statistics and content available via the explore portal and OAI-PMH updated. Main change: more than 200K additional full-texts processed by OpenAIRE mining algorithms. |
2018-04-16 | 2018-04-10 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-03-30 | 2018-03-26 |
Statistics and content available via the explore portal and OAI-PMH updated. New research community: Research Data Alliance |
2018-03-20 | 2018-03-13 | Statistics and content available via the explore portal and OAI-PMH updated. Content update delayed because of technical issues |
2018-02-20 | 2018-02-16 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-02-09 | 2018-02-05 | Statistics and content available via the explore portal and OAI-PMH updated. |
2018-01-30 | 2018-01-17 |
Statistics and content available via the explore portal and OAI-PMH updated. Main changes: |
2017-12-28 | 2017-12-22 | Statistics and content available via the explore portal and OAI-PMH updated. |
2017-12-15 | 2017-12-11 |
Statistics and content available via the explore portal and OAI-PMH updated. FCT naming not fixed yet. |
2017-11-27 | 2017-11-19 |
Statistics and content available via the explore portal and OAI-PMH updated. Portuguese funder FCT appears twice. Once with a wrong name. |
2017-11-13 | 2017-11-03 |
Statistics and content available via the explore portal and OAI-PMH updated. Main change: added projects of funders RCUK and Turkey. |