The purpose of this article is to sketch the technical issues behind duplicate identification in the context of the OpenAIRE infrastructure. Duplicate identification, to be followed by merging of duplicates, is the most important phase of the deduplication process. The major challenge in duplicate identification is the trade off between efficacy, i.e. ability to identify all possible groups of duplicates, and efficiency, i.e. time to process. Our intention is to explain the reasons of possible i...
164 Hits
The purpose of this article is to sketch the technical issues behind duplicate identification in the context of the OpenAIRE infrastructure. Duplicate identification, to be followed by merging of duplicates, is the most important phase of the deduplication process. The major challenge in duplicate identification is the trade off between efficacy, i.e. ability to identify all possible groups of duplicates, and efficiency, i.e. time to process. Our intention is to explain the reasons of possible i...
2640 Hits
Tags: