Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1


OpenAIRE Advance NOADs Kit 2018

general Project resources

NOADS Guide | support kits
NOADS activities reporting forms
support & Dissemination products

webinars & workshops (for NOAds)


OpenAIRE XML schema change announcement

The OpenAIRE Technical Team is pleased to announce a new release of the OpenAIRE XML schema that fixes some inconsistencies and introduces new types of research results in addition to "publication" and "dataset".
The new schema is version 1.0 and it is available at https://www.openaire.eu/schema/1.0/oaf-1.0.xsd. Documentation is accessible at https://www.openaire.eu/schema/1.0/doc/oaf-1.0.html.

Major changes:

  1. Removal of "person" entities: 
    • Removed person schema (oaf-person-<version>.xsd)
    • Removed relationships to persons (oaf-common-1.0.xsd)
    • Added repeatable element creator in result (oaf-result-1.0.xsd). Example: 
      <creator rank="1" name="Rita" surname="Levi-Montalcini">Levi-Montalcini, Rita</creator><creator rank="2" name="Ada" surname="Lovelace">Lovelace, Ada</creator>
    • Added optional contact information elements: contactfullname, contactfax, contactphone, contactemail (oaf-project-1.0.xsd). Example:
      <contactfullname>Doe, John</contactfullname>
      <contactemail>This email address is being protected from spambots. You need JavaScript enabled to view it.</contactemail>
  2. Avoid confusion between access right and license:
    • Element bestlicense renamed to bestaccessright (oaf-result-1.0.xsd)
    • Result instance with both accessright and license (oaf-result-1.0.xsd)
  3. More provenance information:
    • Result instances come with the following additional metadata field: collectedfrom, dateofacceptance (oaf-result-1.0.xsd)
    • Added the following attributes to the journal element: (oaf-result-1.0.xsd)
      • iss: issue
      • vol: volume
      • sp: start page
      • ep: end page
    • Added the following elements to related results: collectedfrom, url, pid (oaf-common-1.0.xsd)
  4. New fields for software: (oaf-result-1.0.xsd)
    • documentationUrl
    • coderepositoryUrl
    • programmingLanguage
    • type
Minor changes:
  • Added field contributor (oaf-result-1.0.xsd)
  • Element 'class' of the project funding tree is now optional (oaf-project-1.0.xsd)
  • Funding hierarchy (i.e. 'funding_level_X' elements) is now optional: fundingtree may only refer to the funder. (oaf-project-1.0.xsd)
  • Added the element openairecompatibility to related datasources (oaf-result-1.0.xsd)
  • Added provenanceaction attribute to all elements of type 'optionalClassedSchemedElement' (e.g. subject) (oaf-common-1.0.xsd)

XML records produced by OpenAIRE at the beginning of 2018 will be compliant to the new schema. If you want to see some records in the new format and start adapting your parsers to the new schema, feel free to download the samples from http://svn-public.driver.research-infrastructures.eu/driver/dnet40/modules/dnet-openaireplus-schema/trunk/schema/1.0/samples/ .

Please note that samples do not include the OAI header.

For any questions, please contact us at the OpenAIRE helpdesk: https://www.openaire.eu/support/helpdesk.

The OpenAIRE Technical Team

About OpenAIRE-Advance

OpenAIRE-Advance continues the mission of OpenAIRE to support the Open Access/Open Data mandates in Europe. By sustaining the current successful infrastructure, comprised of a human network and robust technical services, it consolidates its achievements while working to shift the momentum among its communities to Open Science, aiming to be a trusted e-Infrastructure within the realms of the European Open Science Cloud.

In this next phase, OpenAIRE-Advance strives  to empower its National Open Access Desks (NOADs) so they become a pivotal part within their own national data infrastructures, positioning OA and open science onto national agendas. The capacity building activities bring together experts on topical task groups in thematic areas (open policies, RDM, legal issues, TDM), promoting a train the trainer approach, strengthening and expanding the pan-European Helpdesk with support and training toolkits, training resources and workshops. It examines key elements of scholarly communication, i.e., co-operative OA publishing and next generation repositories, to develop essential building blocks of the scholarly commons.

On the technical level OpenAIRE-Advance focuses on the operation and maintenance of the OpenAIRE technical TRL8/9 services, and radically improves the OpenAIRE services on offer by: a) optimizing their performance and scalability, b) refining their functionality based on end-user feedback, c) repackaging them into products, taking a professional marketing approach with well-defined KPIs, d) consolidating the range of services/products into a common e-Infra catalogue to enable a wider uptake.

OpenAIRE-Advance steps up its outreach activities with concrete pilots with three major RIs, citizen science initiatives, and innovators via a rigorous Open Innovation programme. Finally, via its partnership with COAR, OpenAIRE-Advance consolidates OpenAIRE’s global role extending its collaborations with Latin America, US, Japan, Canada, and Africa.

Aggregation and content provision workflows

Index and stats update:

Next update scheduled to start on: 2018-06-25

Available on the portal

Start date


2018-07-10 2018-06-27 Updated version of OpenAIRE mining algorithms processed more than 400K additional full-texts from Springer Open Access.
 2018-06-08 2018-06-05  
 2018-05-28 2018-05-22  
 2018-05-15 2018-05-08 More than 200K additional full-texts processed by OpenAIRE mining algorithms.
 2018-04-16 2018-04-10  
 2018-03-30 2018-03-26  New research community: Research Data Alliance
 2018-03-20 2018-03-13 Content update delayed because of technical issues
2018-02-20 2018-02-16  
2018-02-09 2018-02-05  
 2018-01-30 2018-01-17

updated version of mining algorithm

updated data model (see detaiils at https://www.openaire.eu/openaire-xml-schema-change-announcement)

2017-12-28 2017-12-22  
2017-12-15 2017-12-11 FCT naming not fixed yet.
 2017-11-27 2017-11-19  Portuguese funder FCT appears twice. Once with a wrong name.
 2017-11-13 2017-11-03 

Added projects of funders RCUK and Turkey.

OpenAIRE makes openly accessible a rich Information Space Graph (ISG) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other. The ISG is constructed via a set of autonomic, orchestrated workflows operating in a regimen of continuous data integration. [1]


What does OpenAIRE collect?

The OpenAIRE technical infrastructure collects information about objects of the research life-cycle compliant to the OpenAIRE acquisition policy [5] from different types of data sources [2]:
  1. Scientific literature metadata and full-texts from institutional and thematic repositories, Open Access journals and publishers;
  2. Dataset metadata from data repositories and data journals;
  3. Scientific literature, data and software metadata from Zenodo;
  4. Metadata about data sources, organizations, projects, and funding programs from entity registries, i.e. authoritative sources such as CORDA and other funder databases for projects, OpenDOAR for publication repositories, re3data for data repositories, DOAJ for Open Access journals;
  5. Coming soon: metadata of open source research software from software repositories (currently available only on https://beta.openaire.eu)
  6. Coming soon: metadata about other types of research products (e.g. workflow, protocols, methods, research packages, etc.)
  7. Coming soon: metadata about scientific literature, datasets, persons, organisations, projects, funding, equipment and services are collected through CRIS (Common Research Information Systems)
Relationships between objects are collected from the data sources, but also automatically detected by inference algorithms [3] and added by authenticated users, who can insert links between publications, datasets and projects via the “claiming” procedure available from the OpenAIRE web portal [4].

What kind of data sources are in OpenAIRE?

Objects and relationships in the OpenAIRE ISG are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds:
  • Institutional or thematic repositories: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC);
  • Open Access Publishers and journals: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles;
  • Data archives: Information systems where scientists deposit descriptive metadata and files about their research data (also known as scientific data, datasets, etc.).;
  • Hybrid repositories/archives: information systems where scientists deposit metadata and file of scientific literature, research data and research software (e.g. Zenodo)
  • Aggregator services: Information systems that, like OpenAIRE, collect descriptive metadata about publications or datasets from multiple sources in order to enable cross-data source discovery of given research products. Examples are DataCite, BASE, DOAJ;
  • Entity Registries: Information systems created with the intent of maintaining authoritative registries of given entities in the scholarly communication, such as OpenDOAR for the institutional repositories, re3data for the data repositories, CORDA and other funder databases for projects and funding information;
  • CRIS (coming soon): Information systems adopted by research and academic organizations to keep track of their research administration records and relative results; examples of CRIS content are articles or datasets funded by projects, their principal investigators, facilities acquired thanks to funding, etc..

How does OpenAIRE collect metadata records?

As of October 2017, OpenAIRE aggregates more than 25 millions of metadata records from more than 2,700 data sources.

OpenAIRE features three workflows for metadata aggregation:
  1. for the aggregation from data sources whose content is known to comply with the OpenAIRE content acquisition policy,
  2. for the aggregation of content that is not known to be eligible according to the policy,
  3. for the aggregation of information packages from entity registries.

Workflow for OpenAIRE compliant data sources

This workflow is for data sources that comply with the OpenAIRE guidelines and thus it is executed for the majority of data sources.

The workflow consists of two phases: collection and transformation.

The collection phase collects information packages in form of XML metadata records from an OAI-PMH endpoint of the data source (as the OpenAIRE guidelines mandate) and stores them in a metadata store.

The transformation phase transforms the collected records according to the OpenAIRE internal data model and stores them in another metadata store, ready to be read for populating the OpenAIRE ISG.

Workflow for data sources with unknown compliance

This workflow applies to data sources that are registered into OpenAIRE but are not known to be OpenAIRE compliant. This is the typical case for aggregators of data repositories (e.g. Datacite).

According to the content acquisition policies OpenAIRE can include a dataset into the ISG only if it has a link to an object  (project or publication) already in the ISG.

Therefore, OpenAIRE collects all metadata records and transforms them according to the internal OpenAIRE data model. Inference algorithms process the records and mark those that satisfy the content acquisition policy, so that they are eligible to enter in the ISG.

Workflow for entity registries

This workflow applies to data sources offering authoritative lists of entities.

The workflow consists of two phases: collection and transformation.

The collection phase collects information packages in the form of files in some machine readable format (e.g. XML, JSON, CSV) via one of the supported exchange protocols (OAI-PMH, SFTP, FTP(S), HTTP, REST).

The transformation phase transforms the packages according to the OpenAIRE internal data model and stores them into a metadata store ready to be read for populating the OpenAIRE ISG.

For additional details about the aggregation workflows, please refer to [7].

What does OpenAIRE do to enrich the collected metadata records?

Once the ISG is populated, OpenAIRE performs de-duplication of organizations and publications [8] and runs inference algorithms [3] to enrich the graph with additional information extracted from the publications' full-texts, namely:
  • subjects
  • links to datasets
  • links to projects
  • links to research communities
  • links to publications
  • links to software
  • links to biological entities (e.g. PDB)
  • Citations
All other information (e.g. access rights, titles, authors, URLs to web resources) are collected from data sources. Whenever the de-duplication algorithm finds duplicates of the same publication, all information from all of the duplicates is kept. OpenAIRE keeps track of the provenance of information (i.e. if it has been inferred by mining algorithm, if it has been claimed by authenticated portal users or if it was present in the metadata record collected from a data source).

How is the enriched OpenAIRE graph published?

The deduplicated and enriched ISG is materialized by the data publishing workflow into four ISG projections:
  1. a full-text index to support search and browse queries from the OpenAIRE portal and to expose subsets of the ISG on the OpenAIRE search API [9],
  2. a E-R database and a dedicated key-value cache for statistics,
  3. a NoSQL document storage in order to support OAI-PMH bulk export of subsets of the ISG in XML format [9],
  4. a triple store in order to expose the ISG as LOD via a SPARQL endpoint (currently in beta) [10]
Every time the data publishing workflow executes, four new ISG projections are generated and placed in a “pre-public status”  before being accessible by the general public.
The switch from pre-public to public, meaning that the currently accessible ISG projections and statistics will be dismissed and the new versions will take their place, is still manual for safety reasons.
Pre-public ISG projections are subject to a set of semi-automatic checks for quality control [11].
Those quality check are needed to evaluate whether the switch to public can be performed or some regressions in the overall data quality need to be addressed first.

How often is the OpenAIRE graph published?

The ISG is published about once every two weeks unless critical quality issues arise in the quality check phase.

Whenever minor issues occur, the ISG is published anyway and details about the issues are
  • tracked via the private ticketing system of the OpenAIRE technical team
  • if the issue depends on the original collected content, it is notified to the affected data source
  • briefly described in the table above, which keeps track of the index and statistics update


[1] Manghi P. et al. (2014) "The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures", Program, Vol. 48 Issue: 4, pp.322-354, https://doi.org/10.1108/PROG-08-2013-0045

[2] Check the data provider page (https://www.openaire.eu/search/data-providers) for the complete list of sources

[3] Bolikowski L. (2015) Text mining services in OpenAIRE: https://blogs.openaire.eu/?p=88

[4] OpenAIRE claiming functionality: https://www.openaire.eu/participate/claim

[5] The OpenAIRE acquisition policy: https://www.openaire.eu/content-acquisition-policy

[6] Check which funders are affiliated with OpenAIRE: https://www.openaire.eu/search/find#projects

[7] Atzori, Claudio, Bardi, Alessia, Manghi, Paolo, & Mannocci, Andrea. (2017). The OpenAIRE workflows for data management. Zenodo. http://doi.org/10.5281/zenodo.996006

[8] Manghi P. (2015) On de-duplication in the OpenAIRE infrastructure: https://blogs.openaire.eu/?p=116

[9] OpenAIRE API documentation: http://api.openaire.eu

[10] OpenAIRE Linked Open Data: http://lod.openaire.eu/documentation

[11]  Mannocci, A., & Manghi, P. (2016, September). DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures. In International Conference on Theory and Practice of Digital Libraries (pp. 357-369). Springer International Publishing. https://doi.org/10.1007/978-3-319-43997-6_28

OpenAIRE2020 Independent Ethics Advisor Position

We are looking to recruit an independent Ethics Expert who will serve in an advisory capacity to the OpenAIRE2020 project and its management board (the Project Steering Committee - PSC) in the evaluation of ethical issues relating to the activities of OpenAIRE. These issues will include but may not be confined to the following:

The tasks

  • Review and recommend regulatory frameworks/policies for OpenAIRE’s open data production, dissemination and use. Ethics issues at the level of OpenAIRE are likely different from those at the national level in that OpenAIRE is dealing with different national ethical and privacy regimes. This becomes more important as OpenAIRE creates added value products from data originating in different sources.
  • Advice the consortium on ethical concerns in the sharing and re-use of the aggregated and enriched data, and particularly of considering issues of privacy, confidentiality, intellectual property and security.
  • Advice on ethical implications of OpenAIRE’s large-scale data integration on researcher/social groups and communities.
  • Advice on whether and how OpenAIRE could and/or should be made accountable for technical/legal decisions on the its operation, and what implications such accountability may have for training and other daily practices.
  • Review the disclosures for potential conflicts of interest of OpenAIRE’s tender calls (selection process) and oversee the processes in the ongoing OA FP7-post grant pilot.
  • Review and recommend revisions of OpenAIRE policies on code of conduct and disclosure of potential conflicts of interest by its officers and members at large.
  • Evaluate relationships of OpenAIRE in furthering its goals in regard to education, research and advocacy.
  • Respond to ethics issues raised by OpenAIRE’s members (National Open Access Desks) and the OpenAIRE executive board. In this regard the Ethics expert will offer arbitration and endeavor to provide a rapid response to ethics issues requiring prompt action by the Project Steering Committee.
  • Recommend on how to set up an Ethics Committee (number of members, the selection process, procedures, term, budget, operation work plan) to start its operation in the spring of 2018, and especially how this committee will be visibly integrated in the pending OpenAIRE legal entity governance structure.

The Expert will

  • Work closely with the OpenAIRE core management/executive team to understand the overall structure, operations and outreach.
  • Collaborate and steer the current OpenAIRE2020 task force which acts as a project interim internal Ethics committee.
  • Make two written reports to the Project Steering Committee with guidance and recommendations (one in October 2017, one in April 2018). These will be accompanied by two reports to be delivered to the European Commission for review.
  • Will attend two physical meetings of the OpenAIRE2020 Project Steering Committee. Interim meetings may be convened by conference call as appropriate.
  • Work from home, as this is considered a self-employed position (contract).


The appointment will last for 11 months. It will start in June 30, 2017 and will end on May 31, 2018.


The expert will be selected by the OpenAIRE2020 executive team with the advice and consent of the Project Steering Committee and the project interim ethics committee.

Reports to

The ethics advisor reports to the Project Steering Committee. Written reports are submitted to the PSC and to the EC through the coordinator.


OpenAIRE2020 will pay the expert with the sum of 5,000 Euros. The selected expert will be responsible for reimbursing his/her own travel to the PSC meetings.

Preferred qualifications and experience

  • Professional experience working with data, big data, data trends, ethics of data, social analytics, or communicating data and data implications to various publics.
  • Work experience involving data, data literacy and the role of data in decision making.
  • Knowledge of the data/digital/e- Infrastructure domain and its needs, open data and open science policies and , and the Research Code of Conduct.
  • Knowledge of international ethics desirable, but not essential.
  • Experience working in a collaborative environment.
  • Recent experience in writing or reading research proposals desirable (in English).

For more information or if you want to apply for the position please contact This email address is being protected from spambots. You need JavaScript enabled to view it.