Guides for researchers

How to identify and assess Research Data Management (RDM) costs

  • The cost of data management

    Under H2020 an Open Research Data pilot was introduced encouraging good data management  and aiming to make research data open. When your Horizon 2020 project is part of the pilot, and your data meets certain conditions, you must deposit your data in a research data repository where they will be findable and accessible for others.

    Data management and sharing activities need to be costed into research, in terms of the time and resources needed. By planning early, costs can be significantly reduced. Costs associated with open access to research data, can be claimed as eligible costs of any Horizon 2020 grant during the duration of the project under the conditions defined in the H2020 Grant Agreement: they must already be budgeted and accepted in the grant proposal, and note the “during the duration of the project”.

  • How to calculate costs?

    We cannot predict your costs for you, the costs for data management and storage vary and depend on your project and the volume, the domain, level of documentation and preservation of your data.

    But we can help you get started on costing your curation activities. In this guide you can find a tool listing, explaining and estimating the cost of possible expenses of data management. Estimates for quantifying amounts are only indicative of the order of magnitude.
  • How to use this costing tool?

    Step 1: Check the data management activities in the table and tick those that may apply to your proposed research.

    Step 2: For each selected activity, estimate the additional time and/or other resources needed and cost this, e.g. people’s time or physical resources needed such as hardware or software. Find out which resources, e.g. for data storage and backup, are available to you from your institution. Consider whether you need a dedicated data manager.

    Step 3: Add these data management costs to your research application. Coordinate resourcing and costing with your institution, research office and institutional IT services.

    Step 4: Plan the data management activities in advance to avoid them competing with the need to focus on research excellence.

    Remember that when your research project nears the end you do not want these additional data management activities to compete with delivery of your planned outputs, writing of publications and the timely delivery of your project. At this later stage the costs of preparing data for sharing may be significantly higher.

  • Estimating costs RDM tool

     
    DMP PHASEACTIVITYCOMMENTS AND SUGGESTIONSCOSTS
    Preparing Make a Data Management Plan
    • Make a DMP before you start creating data; make decisions about managing your data. You can find the template for H2020 DMPs here.
    • Check if there is a department within your organization to support data management planning.
    2 hrs to 2 days, depending on the complexity of your project
    1. Data Collection
    Acquiring External datasets

    Do you plan to use existing data, and is the data available at a commercial partner?

    • Your library may be able to help you acquire a license to a crucial database
    • In research data repositories, data can be available at no or low costs
    Example: A faculty licence on a database for macro-economic analysis: €18.000/y
    1. Data Collection

    Formatting and organising

    • Are your data files, spreadsheets, measurements, interview transcripts, records etc. all in a uniform format or style?
    • Are files, records and items in the collection clearly named with unique file names and well organised?
    • If planned beforehand by developing templates and data entry forms for individual data files (transcripts, spreadsheets, databases) and by constructing clear file structures – low or no additional cost
    • If needed afterwards – higher cost
    Per project organize style, format, names can be done by a student assistant at level 1* salary or data manager at level 2* salary
    1. Data Collection Transcription
    • Will you transcribe qualitative data (e.g. recorded interviews or focus group sessions) as part of your research; or will you need to do this specifically so data can be more easily shared and reused?
    • Is full or partial transcription needed?
    • Is translation needed?
    • Will you need to develop a standard transcription template or transcription guidelines, to ensure consistent
    • If part of research practice – very low or no additional cost·   if not planned as part of research practice – potentially high additional cost
    • Is additional hardware /software needed ?
    • Consider cost of (time needed for) developing procedures, templates and guidance for transcribers
    Example: Time needed for transcription - four to eight hours per hour recording
    1. Data Collection

    Consent for data sharing

    • Do you need to ask participants for their consent for data to be shared?
    • Consent is essential for research in the domain of health/life sciences also for qualitative interviews
    • When consent for data sharing is considered as part of standard consent procedures early in research – very low or no additional cost
    • When participants need to be re-contacted or re-visited to obtain -active consent– could be high cost
    • Does this require extra preparation of information sheets and consent forms; extra time for consent discussions; or training of interviewers?
    Student assistant at level 1* salary or data manager at level 2* salary
    1. Data Collection

    Data transfer

    Are special measures needed to transfer data from mobile devices, from fieldwork sites or from home equipment to a central work server?

    Is software or hardware needed for data transfer, for encryption of confidential data before transfer, or for synchronisation of data files across sites? Free encryption or data transfer software (i.e. SurfFileSender) is available in most cases
    2. Data Documentation

    Data description and Metadata

    • Are data in a spreadsheet, database or data warehouse clearly marked with variable, variable labels and value labels, code descriptions, missing value descriptions, etc.?
    • Are validated questionnaires and standard coding used?
    • Are labels consistent?
    • Are files, records and items in the collection clearly described with well-defined metadata or a metadata standard to interpret the relations between them and to quickly select and understand the content.
    • Do textual data like interview transcripts need description of context, e.g. included as a heading page?
    • If data description is carried out as part of data creation, data input or data transcription – low or no additional cos
    • If needed to be added or harmonized afterwards – higher cost
    • Codebooks for datasets can often be easily exported from software packages

    Examples: 4 hrs per single experiment (120 measurements) filling in 60 required metadata fields, with assistance of a data manager at level 2* salary

    Two to three weeks are costed into an average two year research grant application to prepare and collate materials for deposit

    More information: http://www.data-archive.ac.uk/help/user-faq

    2. Data Documentation

    Documentation

    Do you have documentation for the data that describes the context and methodology of how data were gathered, created, processed and quality controlled?

    • Often essential contextual and methods documentation will be written up in publications and reports
    • If all data creation steps are well documented and documentation is kept well organised during research – low or no additional cost
    • If documentation to be written or compiled specifically afterwards – higher cost
    Researcher at level 2* salary.
    3. Data Storage & Back-up

    Data backup

    • Does the institution provide regular backup or not?
    • Consider how frequently backups should be done, how many backups should be stored.
    Institutional backup – included in standard indirect cost/overheadsadditional backup needed – cost according to number of copies to be kept, frequency of backup and storage media needed Examples: University drive €0.80 per GB/y Cloud: €0.30 per GB/y2 x Harddrive: €0.14 per GB (single purchase)
    3. Data Storage & Back-up

    Data storage

    • How much data storage space is needed for the entire duration of the project?
    • Do you need to set up a data model and accompanying database for the data?
    • If storage is provided by the institution – cost is included in standard indirect costs or overheads
    • If additional storage needed – cost server/ disk space, as well as the cost of setting up and maintenance
    • Do you need a data warehouse or a database architect?
    Example: Cloud Database as a service:€160/Month (storage 5GB transfer 30GB)Database architect at level 2* salary
    4. Data Access & Security

    Data Access

    Do external people require access to research data?

    Does remote access via VPN or secure FTP need to be arranged for external people? Mostly researchers can make use of existing, free services
    4. Data Access & Security

    Data security

    • Is there an institutional server available where you can store your data safely?
    • Protect data from unauthorised access or use or from disclosure
    • For confidential or privacy sensitive data, determining conditions for controlling access to shared data may require extra time and discussion
    • Can security be arranged by institutional IT services or is extra software/hardware needed?
    • Data files may need encrypting before storage or transfers

    Example: TTP (trusted third party), dependent on pseudonymisation type, ca. €1.000- €30.000

    Existing encryption services could be used at no costs

    5. Data Preservation & Archiving

    File format

    Do data need to be converted to a standard or open format with long- term validity for long-term preservation?

    • Is additional software or hardware needed for conversion?
    • For audio-visual data, converting to open digital formats can be time-consuming or require special equipment and/or software
    • For databases, conversions may require checking for truncation, loss of metadata or annotation, loss of relationships, etc.
    Researcher at level 2* salary
    6. Data Sharing & Reuse

    Anonymisation

    • Do you need to remove identifying information or conceal the identity of participants (e.g. using pseudonyms) before data can be shared?
    • Anonymisation needs to be consistent throughout a data collection.
    • If anonymisation is planned before data collection or transcription/digitisation – cost can be lowered
    • For audio-visual data – anonymising/editing voices or faces can be very costly and could reduce the usefulness of data
    • For quantitative data (e.g. survey data) – low cost if identifiers are a priori excluded from data files, are easy to remove, or identifiable variables are coded to avoid disclosure; cost may be higher if variables need recoding afterwards to avoid disclosure
    • For qualitative textual data (e.g. interview transcripts) – costs can be reduced if anonymisation is carried out during transcription (or at least highlighted/coded during transcription)
    • Cost depends on how sensitive or complex data are and how much identifying information is recorded in the data– if only removal of names is required, cost is low; pseudonymisation will require more time
    • For files received of participants, check file properties and edit to remove disclosive information such as editor/author name

    Free software is available. AMNESIA is a data anonymization tool, that allows to remove identifying information from data.

    Example: Transcribing / simultaneously anonymizing audio (speech): up until one hour per 5 minute fragment (depending on the preciseness level of transcribing)

    Student assistant at level 1* salary

    6. Data Sharing & Reuse

    Copyright

    • Do other parties hold copyright in the data?
    • Do you need to seek copyright clearance before sharing data?
    • Is time required to seek copyright clearance?
    • Is legal advice required?
    Juridical advice at level 3* salary
    6. Data Sharing & Reuse

    Data sharing

    • Will your data be deposited with a data centre or research data l repository?
    • Which requirements exist to prepare data to particular standards e.g. regarding documentation or format?
    • Do structured metadata need to be created when data are shared via a data centre or archive, e.g. completing a deposit form for the UK Data Archive?
    • What data will be retained and what not?
    • How long is the data required to be available,
    • A research data repository/ data centre/ journal can help you make your data open and provide you with the possibility to share your data for reuse. Find out what the cost are of data deposit and/or longer-term storage per year cost in time and effort needed to prepare the data for sharing and preservation
    • Data centres will have their own metadata forms. Consider using these on beforehand

    Examples: Completing a data repository upload form (i.e. Zenodo a free-of-charge repository) may take 15 min to 4 hrs

    Dryad €110 once (max 20 GB) DataverseNL €3.60 dper GB/yearCloud Database as a service:€160 /month (storage 5 GB, transfer 30 GB)

    6. Data Sharing & Reuse

    Data cleaning

    • Do quantitative data need to be cleaned, checked or verified before sharing, e.g. check validity of codes used, check for anomalous values?
    • Will data match documentation, e.g. same number of variables, cases, records, files?
    • Does textual information in data need to be spell-checked?
    • Do you need to combine your data with other datasets for your research
    • Data cleaning takes time
    • If carried out as part of data entry and preparation before data analysis – low additional cost
    • If needed afterwards – higher cost

    Example: Data cleaning service: €270 to well over €1800

    More information: http://datascopic.net/cost-of-data-cleansing/data-cleansing/

    Researcher/data manager at level 2* salary

    6. Data Sharing & Reuse

    Digitisation

    Do analogue or paper-based research data (maps. newspaper clippings, photographs, images, text) need to be digitised to increase their potential for sharing?

    • Is additional equipment or software needed for scanning or conversion?
    • If simply image scanning of text – relatively low cost if Optical Character Recognition required, with manual checking for accuracy (revising entire scanned text) – may be high cost
    • If manual data entry or typing needed, e.g. to digitise tabular data – may be high cost
    Example: Digitisation €0.50 per page (few pages) OR €320-390 per 1000 pages (OCR included)
    Overall

    Roles and responsibilities

    Do you need to allocate roles and responsibilities for various data management activities?

    If multiple partner institutions, researchers or funders are involved in research – consider cost of data management planning meetings or discussions Travel costs, lunch, time
    Overall

    Operationalising data management 

    What measures are needed to implement and operationalise data management throughout the research lifecycle?

    • Do you need extra time and resources to implement data management throughout your research, e.g. regular team meetings, setting up a collaborative research environment?
    • If staff training is required - higher cost
    • Do you need a dedicated data manager?
    Data manager at level 2* salary

    * Local salary scales differ per country. E.g.:
    - Level 1 (i.e. student assistant) ~ 17 euro per hour.
    - Level 2 (researcher, data manager) ~60 euro per hour
    - Level 3 (external expert) ~160 euro per hour.)

    This guide was based on the work of the UK Data Service and the Landelijk Coördinatiepunt Research Data Management
    • UK Data Service (2013). Data management costing tool. UK Data Archive, University of Essex.
    • Alisa Westerhof (UU), Tessa Pronk (UU),Annemiek van der Kuil(3TU & TUD), Annemie Mordant (UM)(2015). Data Management Bij wetenschappelijk onderzoek méér dan alleen storage. Landelijk Coördinatiepunt Research Data Management, The Netherlands.
    This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence.

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.
OpenAIRE
European Commission

Subscribe

  Unless otherwise indicated, all materials created by OpenAIRE are licenced under CC ATTRIBUTION 4.0 INTERNATIONAL LICENSE.
OpenAIRE uses cookies in order to function properly. By using the OpenAIRE portal you accept our use of cookies.
More information Ok