On Deduplication in the OpenAIRE infrastructure

The purpose of this article is to sketch the technical issues behind duplicate identification in the context of the OpenAIRE infrastructure. Duplicate identification, to be followed by merging of duplicates, is the most important phase of the deduplication process. The major challenge in duplicate identification is the trade off between efficacy , i.e. ability to identify all possible groups of duplicates, and efficiency , i.e. time to process. Our intention is to explain the reasons of possible...
Continue reading
  66 Hits
  0 Comments
66 Hits
0 Comments

Text Mining Services in OpenAIRE

Recently in Athens there was an impressive kick-off of the OpenAIRE2020 project, during which we presented OpenAIRE’s plans in the area of text and data mining of scholarly publications. Publications contain all kinds of rich information, which, although understandable to a human reader, are not machine-readable and thus cannot be used directly for indexing and recommending purposes. Authors’ affiliations, document classifications, references to biological and chemical databases, acknowledgement...
Continue reading
  245 Hits
  0 Comments
245 Hits
0 Comments
OpenAIRE
flag black white lowOpenAIRE-Advance receives funding from the European Union's Horizon 2020 Research and Innovation programme under Grant Agreement No. 777541.

Subscribe

  Unless otherwise indicated, all materials created by OpenAIRE are licenced under CC ATTRIBUTION 4.0 INTERNATIONAL LICENSE.
OpenAIRE uses cookies in order to function properly. By using the OpenAIRE portal you accept our use of cookies.
More information Ok