The OpenAIRE service infrastructure harvests metadata about scholarly communication products (literature, datasets, software, and other research products) and links between such products from a range of institutional or subject repositories, national and institutional research information portals, aggregators, e-journals, data repositories, and software repositories. In addition, it infers links between literature and such products via advanced text and data mining techniques (TDM). The resulting information graph (i.e. interlinked sets of objects) is intended to favour monitoring of open science and open science publishing workflows (e.g. science reproducibility and transparent assessment).
Literature, Datasets, Software, other research products
OpenAIRE accepts the metadata records of all scientific products whose structure respect the model and semantics as expressed by the OpenAIRE guidelines. This means that both Open Access and non-Open Access material will be included and links to other products will be resolved where this is possible (i.e. the provided PIDs have a resolver).
Datasets with accession numbers (database entries) are not included as OpenAIRE datasets but, when a relationship to product exists, are included as properties of the related products. More specifically, they are included as values of the property externalReference of product metadata; externalReference includes a URL to the splash page, the target web site name, the ID and an ID type (e.g. PDB).
Full-text of scientific literature
OpenAIRE collects Open Access literature product files whenever these are accessible from the URL provided in the metadata record. The literature full-text is used for text-mining purposes. End-users willing to access, download, and read the actual files will not be able to do so from OpenAIRE, but will be forwarded to the original source of deposition. For further information on the use of full-texts, please view OpenAIRE’s ToU.
OpenAIRE services collect metadata about four typologies of products: literature, datasets, software, and “other research products” (ORPs). Metadata can be collected from four main categories of repositories: literature repositories (including institutional/thematic repositories, publishers, and catalogues), data repositories, software repositories, and ORP repositories. As things stands there is no one-to-one relationships between a type of repository and the products it contains, e.g. literature repositories may indeed also contain datasets, software, and ORPs. Accordingly, the aggregation process needs to classify the products collected from a repository in order to assign them to the correct entity class in OpenAIRE. The distribution rules are illustrated in the Table below, which follow vocabularies in the OpenAIRE guidelines and Version 4.0 of DataCite. Please note that such mappings may be modified over time to reflect the general preferences and requirements of the OpenAIRE user community.
|Literature type||Dataset type||Software type||Other research product type|
Guidelines for literature repositories (v4.0)
Includes: publishers, journals, institutional repositories, aggregators, catalogues
|Resource type different from the ones associated to Dataset, Software, and Other products||Resource type indicating datasets, image, video, audio||Resource type indicating software||Resource type indicating other research products (e.g. “Service”, “Interactive Resource”, “Other” etc.)|
Guidelines for data repositories
Includes: data repositories, aggregators
|Resource type indicating papers (based on repository specific vocabularies)||Resource type different from the ones associated to Literature, Software and Other products||Resource type indicating software||Resource type indicating other research products (e.g. “Service”, “Interactive Resource”, “Other” etc.)|
|Guidelines for Software repositories:
|Guidelines for Other research products repositories:
other product repositories
The mappings between specific types of products and the target OpenAIRE entities: literature, dataset, software, ORP (other research product) are specified by dedicated vocabularies that are continously updated. Independently from the category of input repository, the aggregation process identifies for each input record a term in a common vocabulary. Each term of this vocabulary is then associated to one of the four OpenAIRE entities literature, dataset, software, and ORP. Tables below provide some examples of specific terms and their corresponding target OpenAIRE entity.
Remark: Such associations may be modified over time to reflect the general preferences and requirements of the OpenAIRE user community. To see the current versions of the mappings, please refer to the links available before each table.
Up-to-date mapping available at http://api.openaire.eu/vocabularies/dnet:result_typologies/publication
|0005||Contribution for newspaper or weekly magazine|
|0009||External Research Report|
|0013||Part of book or chapter of book|
|0038||Other literature type|
Up-to-date mapping available at http://api.openaire.eu/vocabularies/dnet:result_typologies/dataset
|0039||Other dataset type|
Up-to-date mapping available at http://api.openaire.eu/vocabularies/dnet:result_typologies/software
|0040||Other software type|
Up-to-date mapping available at http://api.openaire.eu/vocabularies/dnet:result_typologies/other
|0020||Other ORP type|