Skip to main content

Content Acquisition Policy


The OpenAIRE service infrastructure harvests metadata about scholarly communication products (literature, datasets, software, and other research products) and links between such products from a range of institutional or subject repositories, national and institutional research information portals, aggregators, e-journals, data repositories, and software repositories. In addition, it infers links between literature and such products via advanced text and data mining techniques (TDM). The resulting information graph (i.e. interlinked sets of objects) is intended to favour monitoring of open science and open science publishing workflows (e.g. science reproducibility and transparent assessment).

    • Coverage OpenAIRE will actively pursue harvesting content from European but also non-European repositories.
    • Reproducibility The OpenAIRE graph  aims at interlinking research outcomes  (citations, mentions, references) and also with the research entities (people, organisations, services, facilities) they generate or use them. Publications have a special role as they are linked with all products used or resulting in the relative research activity. 
OpenAIRE publishes every 6 months the graph in Zenodo.
  • Monitoring The OpenAIRE graph  links research products with the funders and projects resulting from their grants. It also includes affiliation links to allow monitoring at institutional level.
  • Research communities The OpenAIRE graph links research products with the communities for which they are relevant, in order to provide a (multi-)community-view of the scholarly output
  • Quality Data sources and repositories are quality-controlled: their metadata respects the OpenAIRE guidelines and their import in OpenAIRE is curated by OpenAIRE data curators with the support of the OpenAIRE Metadata Validator. . Click here to see all data sources that OpenAIRE currently harvests from.
  • Terms of Use Data sources and repositories accept a ToU with OpenAIRE (and viceversa) to re-use data under specific consent, warranties, and license. Click here for the ToU

Aggregation policies by type of product

Literature, Datasets, Software, other research products
OpenAIRE accepts the metadata records of all scientific products whose structure respect the model and semantics as expressed by the OpenAIRE guidelines. This means that both Open Access and non-Open Access material will be included and links to other products will be resolved where this is possible (i.e. the provided PIDs have a resolver).

Accession numbers
Datasets with accession numbers (database entries) are not included as OpenAIRE datasets but, when a relationship to product exists, are included as properties of the related products. More specifically, they are included as values of the property externalReference of product metadata; externalReference includes a URL to the splash page, the target web site name, the ID and an ID type (e.g. PDB).

Full-text of scientific literature
OpenAIRE collects Open Access literature product files whenever these are accessible from the URL provided in the metadata record. The literature full-text is used for text-mining purposes. End-users willing to access, download, and read the actual files will not be able to do so from OpenAIRE, but will be forwarded to the original source of deposition. For further information on the use of full-texts, please view OpenAIRE’s ToU.

Aggregation policy by category of repository

OpenAIRE services collect metadata about four typologies of products: literature, datasets, software, and “other research products” (ORPs). Metadata can be collected from four main categories of repositories: literature repositories (including institutional/thematic repositories, publishers, and catalogues), data repositories, software repositories, and ORP repositories. As things stands there is no one-to-one relationships between a type of repository and the products it contains, e.g. literature repositories may indeed also contain datasets, software, and ORPs. Accordingly, the aggregation process needs to classify the products collected from a repository in order to assign them to the correct entity class in OpenAIRE. The distribution rules are illustrated in the Table below, which follow vocabularies in the OpenAIRE guidelines and Version 4.0 of DataCite. Please note that such mappings may be modified over time to reflect the general preferences and requirements of the OpenAIRE user community.

 Literature typeDataset typeSoftware typeOther research product type (any product that is not of type literature, dataset, or software)

Guidelines for literature repositories (v4.0)

Includes: publishers, journals, institutional repositories, aggregators, catalogues

Resource type different from the ones associated to Dataset, Software, and Other products Resource type indicating datasets, image, video, audio Resource type indicating software Resource type indicating other research products (e.g. “Service”, “Interactive Resource”, “Other” etc.)

Guidelines for data repositories

Includes: data repositories, aggregators

Resource type indicating papers (based on repository specific vocabularies) Resource type different from the ones associated to Literature, Software and Other products Resource type indicating software Resource type indicating other research products (e.g. “Service”, “Interactive Resource”, “Other” etc.)
Guidelines for Software repositories:
Software repositories
None None All records None
Guidelines for Other research products repositories:
other product repositories
None None None All records


Research products and their associated types in OpenAIRE

The mappings between specific types of products and the target OpenAIRE entities: literature, dataset, software, ORP (other research product) are specified by dedicated vocabularies that are continously updated. Independently from the category of input repository, the aggregation process identifies for each input record a term in a common vocabulary. Each term of this vocabulary is then associated to one of the four OpenAIRE entities literature, dataset, software, and ORP. Tables below provide some examples of specific terms and their corresponding target OpenAIRE entity.

Remark: Such associations may be modified over time to reflect the general preferences and requirements of the OpenAIRE user community. To see the current versions of the mappings, please refer to the links available before each table.

Research product “Literature”

Up-to-date mapping available at

Encoding  Team
0001 Article
0002 Book
0004 Conference Object
0005 Contribution for newspaper or weekly magazine 
0006 Doctoral Thesis
0007 Master Thesis
0008  Bachelor Thesis
0009 External Research Report
0011 Internal report
0012 Newsletter
0013 Part of book or chapter of book
0014 Research
0015 Review
0016 Preprint
0017 Report
0019 Patent
0031 Data Paper
0032 Software Paper
0034 Project deliverable
0035 Project milestone
0036 Project proposal
0038 Other literature type


Research product “Dataset”

Up-to-date mapping available at

Encoding  Team
0021 Dataset
0024 Film
0025 Image
0030 Sound
0033 Audiovisual
0037 Clinical Trial
0039 Other dataset type


Research product “Software”

Up-to-date mapping available at

Encoding  Team
0029 Software
0040 Other software type


Research product “Other research product”

Up-to-date mapping available at

Encoding  Team
0010 Lecture
0018 Annotation
0023 Event
0026 Interactive Resource
0027 Model
0028 Physical Object
0020 Other ORP type