Skip to main content

On Deduplication in the OpenAIRE infrastructure

The purpose of this article is to sketch the technical issues behind duplicate identification in the context of the OpenAIRE infrastructure. Duplicate identification, to be followed by merging of duplicates, is the most important phase of the deduplication process. The major challenge in duplicate identification is the trade off between efficacy, i.e. ability to identify all possible groups of duplicates, and efficiency, i.e. time to process. Our intention is to explain the reasons of possible i...

Continue reading

On Deduplication in the OpenAIRE infrastructure

The purpose of this article is to sketch the technical issues behind duplicate identification in the context of the OpenAIRE infrastructure. Duplicate identification, to be followed by merging of duplicates, is the most important phase of the deduplication process. The major challenge in duplicate identification is the trade off between efficacy, i.e. ability to identify all possible groups of duplicates, and efficiency, i.e. time to process. Our intention is to explain the reasons of possible i...

Continue reading