OpenAIRE's Content Acquisition Policy

  • Aggregation policies by type of product

    Literature, Datasets, Software, other research products
    OpenAIRE accepts the metadata records of all scientific products whose structure respect the model and semantics as expressed by the OpenAIRE guidelines. This means that both Open Access and non-Open Access material will be included and links to other products will be resolved where this is possible (i.e. the provided PIDs have a resolver).

    Accession numbers
    Datasets with accession numbers (database entries) are not included as OpenAIRE datasets but, when a relationship to product exists, are included as properties of the related products. More specifically, they are included as values of the property externalReference of product metadata; externalReference includes a URL to the splash page, the target web site name, the ID and an ID type (e.g. PDB).

    Full-text of scientific literature
    OpenAIRE collects Open Access literature product files whenever these are accessible from the URL provided in the metadata record. The literature full-text is used for text-mining purposes. End-users willing to access, download, and read the actual files will not be able to do so from OpenAIRE, but will be forwarded to the original source of deposition. For further information on the use of full-texts, please view OpenAIRE’s ToU.

  • Rationale

    The OpenAIRE service infrastructure harvests metadata about scholarly communication
    products (literature, datasets, software, and other research products) and links between such products from a range of institutional or subject repositories, national and institutional research information portals, aggregators, e-journals, data repositories, and software repositories. In addition, it infers links between literature and such products via advanced text and data mining techniques (TDM). The resulting information graph (i.e. interlinked sets of objects) is intended to favour monitoring of open science and open science publishing workflows (e.g. science reproducibility and transparent assessment).

    • Coverage OpenAIRE will actively pursue harvesting content from European but also non-European repositories;
      Reproducibility The OpenAIRE graph aims at linking scientific literature, namely the narration of scientific motivation and process, with all products used or resulting in the relative research activity
    • Monitoring The OpenAIRE graph links research products with the funders and projects resulting from their grants
    • Research communities The OpenAIRE graph links research products with the communities for which they are relevant, in order to provide a (multi-)community-view of the scholarly output
    • Quality Data sources and repositories are quality-controlled: their metadata respects the OpenAIRE guidelines and their import in OpenAIRE is curated by OpenAIRE data curators. Click here to see the repositories that OpenAIRE currently harvests from.
    • Terms of Use for content providers Data source managers read and accept OpenAIRE ToU (and vice versa) in order for OpenAIRE to re-use their content under specific consent, warranties, and license.product 
  • Research products and their associated types in OpenAIRE

    The mappings between specific types of products and the target OpenAIRE entities: literature, dataset, software, ORP (other research product) are specified by dedicated vocabularies that are continously updated. Independently from the category of input repository, the aggregation process identifies for each input record a term in a common vocabulary. Each term of this vocabulary is then associated to one of the four OpenAIRE entities literature, dataset, software, and ORP. Tables below provide some examples of specific terms and their corresponding target OpenAIRE entity.

    Remark: Such associations may be modified over time to reflect the general preferences and requirements of the OpenAIRE user community. To see the current versions of the mappings, please refer to the links available before each table.

    Research product “Literature”

    Up-to-date mapping available at

    Encoding   Team
    0001 Article
    0002 Book
    0004 Conference Object
    0005 Contribution for newspaper or weekly magazine 
    0006 Doctoral Thesis
    0007 Master Thesis
    0008  Bachelor Thesis
    0009 External Research Report
    0011 Internal report
    0012 Newsletter
    0013 Part of book or chapter of book
    0014 Research
    0015 Review
    0016 Preprint
    0017 Report
    0019 Patent
    0031 Data Paper
    0032 Software Paper
    0034 Project deliverable
    0035 Project milestone
    0036 Project proposal
    0038 Other literature type


    Research product “Dataset”

    Up-to-date mapping available at

    Encoding   Team
    0021 Dataset
    0024 Film
    0025 Image
    0030 Sound
    0033 Audiovisual
    0037 Clinical Trial
    0039 Other dataset type


    Research product “Software”

    Up-to-date mapping available at

    Encoding   Team
    0029 Software
    0040 Other software type


    Research product “Other research product”

    Up-to-date mapping available at

    Encoding   Team
    0010 Lecture
    0018 Annotation
    0023 Event
    0026 Interactive Resource
    0027 Model
    0028 Physical Object
    0020 Other ORP type


  • Aggregation policy by category of repository

    OpenAIRE services collect metadata about four typologies of products: literature, datasets, software, and “other research products” (ORPs). Metadata can be collected from four main categories of repositories: literature repositories (including institutional/thematic repositories, publishers, and catalogues), data repositories, software repositories, and ORP repositories. As things stands there is no one-to-one relationships between a type of repository and the products it contains, e.g. literature repositories may indeed also contain datasets, software, and ORPs. Accordingly, the aggregation process needs to classify the products collected from a repository in order to assign them to the correct entity class in OpenAIRE. The distribution rules are illustrated in Table 1, which follow vocabularies in the OpenAIRE guidelines and Version 4.0 of DataCite. Please note that such mappings may be modified over time to reflect the general preferences and requirements of the OpenAIRE user community.


      Literature type Dataset type Software type Other research product type (any product that is not of type literature, dataset, or software)
    Guidelines for literature repositories
    Includes: publishers, journals, institutional repositories, aggregators, catalogues
    Resource type different from the ones associated to Dataset, Software, and Other products Resource type indicating datasets, image, video, audio Resource type indicating software Resource type indicating other research products (e.g. “Service”, “Interactive Resource”, “Other” etc.)
    Guidelines for data repositories
    Includes: data repositories, aggregators
    Resource type indicating papers (based on repository specific vocabularies) Resource type different from the ones associated to Literature, Software and Other products Resource type indicating software Resource type indicating other research products (e.g. “Service”, “Interactive Resource”, “Other” etc.)
    Guidelines for Software repositories:
    Software repositories
    None None All records None
    Guidelines for Other research products repositories: other product repositories None None None All records


Follow Us

Keep in touch

Subscribe to our Newsletter

Please enable the javascript to submit this form

OpenAIRE has received funding from the European Union's Horizon 2020 Research and Innovation programme under Grant Agreements No. 777541 and 101017452 (see all).

cc bycc byUnless otherwise indicated, all materials created by OpenAIRE are licenced under CC ATTRIBUTION 4.0 INTERNATIONAL LICENSE.