Preparing |
Make a Data Management Plan |
- Make a DMP before you start creating data; make decisions about managing your data. You can find the template for H2020 DMPs here.
- Check if there is a department within your organization to support data management planning.
|
2 hrs to 2 days, depending on the complexity of your project |
1. Data Collection |
Acquiring External datasets
Do you plan to use existing data, and is the data available at a commercial partner?
|
- Your library may be able to help you acquire a license to a crucial database
- In research data repositories, data can be available at no or low costs
|
Example: A faculty licence on a database for macro-economic analysis: €18.000/y |
1. Data Collection |
Formatting and organising
- Are your data files, spreadsheets, measurements, interview transcripts, records etc. all in a uniform format or style?
- Are files, records and items in the collection clearly named with unique file names and well organised?
|
- If planned beforehand by developing templates and data entry forms for individual data files (transcripts, spreadsheets, databases) and by constructing clear file structures – low or no additional cost
- If needed afterwards – higher cost
|
Per project organize style, format, names can be done by a student assistant at level 1* salary or data manager at level 2* salary |
1. Data Collection |
Transcription
- Will you transcribe qualitative data (e.g. recorded interviews or focus group sessions) as part of your research; or will you need to do this specifically so data can be more easily shared and reused?
- Is full or partial transcription needed?
- Is translation needed?
- Will you need to develop a standard transcription template or transcription guidelines, to ensure consistent
|
- If part of research practice – very low or no additional cost· if not planned as part of research practice – potentially high additional cost
- Is additional hardware /software needed ?
- Consider cost of (time needed for) developing procedures, templates and guidance for transcribers
|
Example: Time needed for transcription - four to eight hours per hour recording |
1. Data Collection |
Consent for data sharing
- Do you need to ask participants for their consent for data to be shared?
- Consent is essential for research in the domain of health/life sciences also for qualitative interviews
|
- When consent for data sharing is considered as part of standard consent procedures early in research – very low or no additional cost
- When participants need to be re-contacted or re-visited to obtain -active consent– could be high cost
- Does this require extra preparation of information sheets and consent forms; extra time for consent discussions; or training of interviewers?
|
Student assistant at level 1* salary or data manager at level 2* salary |
1. Data Collection |
Data transfer
Are special measures needed to transfer data from mobile devices, from fieldwork sites or from home equipment to a central work server?
|
Is software or hardware needed for data transfer, for encryption of confidential data before transfer, or for synchronisation of data files across sites? |
Free encryption or data transfer software (i.e. SurfFileSender) is available in most cases |
2. Data Documentation |
Data description and Metadata
- Are data in a spreadsheet, database or data warehouse clearly marked with variable, variable labels and value labels, code descriptions, missing value descriptions, etc.?
- Are validated questionnaires and standard coding used?
- Are labels consistent?
- Are files, records and items in the collection clearly described with well-defined metadata or a metadata standard to interpret the relations between them and to quickly select and understand the content.
- Do textual data like interview transcripts need description of context, e.g. included as a heading page?
|
- If data description is carried out as part of data creation, data input or data transcription – low or no additional cos
- If needed to be added or harmonized afterwards – higher cost
- Codebooks for datasets can often be easily exported from software packages
|
Examples: 4 hrs per single experiment (120 measurements) filling in 60 required metadata fields, with assistance of a data manager at level 2* salary
Two to three weeks are costed into an average two year research grant application to prepare and collate materials for deposit
More information: http://www.data-archive.ac.uk/help/user-faq
|
2. Data Documentation |
Documentation
Do you have documentation for the data that describes the context and methodology of how data were gathered, created, processed and quality controlled?
|
- Often essential contextual and methods documentation will be written up in publications and reports
- If all data creation steps are well documented and documentation is kept well organised during research – low or no additional cost
- If documentation to be written or compiled specifically afterwards – higher cost
|
Researcher at level 2* salary. |
3. Data Storage & Back-up |
Data backup
- Does the institution provide regular backup or not?
- Consider how frequently backups should be done, how many backups should be stored.
|
Institutional backup – included in standard indirect cost/overheadsadditional backup needed – cost according to number of copies to be kept, frequency of backup and storage media needed |
Examples: University drive €0.80 per GB/y Cloud: €0.30 per GB/y2 x Harddrive: €0.14 per GB (single purchase) |
3. Data Storage & Back-up |
Data storage
- How much data storage space is needed for the entire duration of the project?
- Do you need to set up a data model and accompanying database for the data?
|
- If storage is provided by the institution – cost is included in standard indirect costs or overheads
- If additional storage needed – cost server/ disk space, as well as the cost of setting up and maintenance
- Do you need a data warehouse or a database architect?
|
Example: Cloud Database as a service:€160/Month (storage 5GB transfer 30GB)Database architect at level 2* salary |
4. Data Access & Security |
Data Access
Do external people require access to research data?
|
Does remote access via VPN or secure FTP need to be arranged for external people? |
Mostly researchers can make use of existing, free services |
4. Data Access & Security |
Data security
- Is there an institutional server available where you can store your data safely?
- Protect data from unauthorised access or use or from disclosure
|
- For confidential or privacy sensitive data, determining conditions for controlling access to shared data may require extra time and discussion
- Can security be arranged by institutional IT services or is extra software/hardware needed?
- Data files may need encrypting before storage or transfers
|
Example: TTP (trusted third party), dependent on pseudonymisation type, ca. €1.000- €30.000
Existing encryption services could be used at no costs
|
5. Data Preservation & Archiving |
File format
Do data need to be converted to a standard or open format with long- term validity for long-term preservation?
|
- Is additional software or hardware needed for conversion?
- For audio-visual data, converting to open digital formats can be time-consuming or require special equipment and/or software
- For databases, conversions may require checking for truncation, loss of metadata or annotation, loss of relationships, etc.
|
Researcher at level 2* salary |
6. Data Sharing & Reuse |
Anonymisation
- Do you need to remove identifying information or conceal the identity of participants (e.g. using pseudonyms) before data can be shared?
- Anonymisation needs to be consistent throughout a data collection.
|
- If anonymisation is planned before data collection or transcription/digitisation – cost can be lowered
- For audio-visual data – anonymising/editing voices or faces can be very costly and could reduce the usefulness of data
- For quantitative data (e.g. survey data) – low cost if identifiers are a priori excluded from data files, are easy to remove, or identifiable variables are coded to avoid disclosure; cost may be higher if variables need recoding afterwards to avoid disclosure
- For qualitative textual data (e.g. interview transcripts) – costs can be reduced if anonymisation is carried out during transcription (or at least highlighted/coded during transcription)
- Cost depends on how sensitive or complex data are and how much identifying information is recorded in the data– if only removal of names is required, cost is low; pseudonymisation will require more time
- For files received of participants, check file properties and edit to remove disclosive information such as editor/author name
|
Free software is available. AMNESIA is a data anonymization tool, that allows to remove identifying information from data.
Example: Transcribing / simultaneously anonymizing audio (speech): up until one hour per 5 minute fragment (depending on the preciseness level of transcribing)
Student assistant at level 1* salary
|
6. Data Sharing & Reuse |
Copyright
- Do other parties hold copyright in the data?
- Do you need to seek copyright clearance before sharing data?
|
- Is time required to seek copyright clearance?
- Is legal advice required?
|
Juridical advice at level 3* salary |
6. Data Sharing & Reuse |
Data sharing
- Will your data be deposited with a data centre or research data l repository?
- Which requirements exist to prepare data to particular standards e.g. regarding documentation or format?
- Do structured metadata need to be created when data are shared via a data centre or archive, e.g. completing a deposit form for the UK Data Archive?
- What data will be retained and what not?
|
- How long is the data required to be available,
- A research data repository/ data centre/ journal can help you make your data open and provide you with the possibility to share your data for reuse. Find out what the cost are of data deposit and/or longer-term storage per year cost in time and effort needed to prepare the data for sharing and preservation
- Data centres will have their own metadata forms. Consider using these on beforehand
|
Examples: Completing a data repository upload form (i.e. Zenodo a free-of-charge repository) may take 15 min to 4 hrs
Dryad €110 once (max 20 GB) DataverseNL €3.60 dper GB/yearCloud Database as a service:€160 /month (storage 5 GB, transfer 30 GB)
|
6. Data Sharing & Reuse |
Data cleaning
- Do quantitative data need to be cleaned, checked or verified before sharing, e.g. check validity of codes used, check for anomalous values?
- Will data match documentation, e.g. same number of variables, cases, records, files?
- Does textual information in data need to be spell-checked?
- Do you need to combine your data with other datasets for your research
|
- Data cleaning takes time
- If carried out as part of data entry and preparation before data analysis – low additional cost
- If needed afterwards – higher cost
|
Example: Data cleaning service: €270 to well over €1800
More information: http://datascopic.net/cost-of-data-cleansing/data-cleansing/
Researcher/data manager at level 2* salary
|
6. Data Sharing & Reuse |
Digitisation
Do analogue or paper-based research data (maps. newspaper clippings, photographs, images, text) need to be digitised to increase their potential for sharing?
|
- Is additional equipment or software needed for scanning or conversion?
- If simply image scanning of text – relatively low cost if Optical Character Recognition required, with manual checking for accuracy (revising entire scanned text) – may be high cost
- If manual data entry or typing needed, e.g. to digitise tabular data – may be high cost
|
Example: Digitisation €0.50 per page (few pages) OR €320-390 per 1000 pages (OCR included) |
Overall |
Roles and responsibilities
Do you need to allocate roles and responsibilities for various data management activities?
|
If multiple partner institutions, researchers or funders are involved in research – consider cost of data management planning meetings or discussions |
Travel costs, lunch, time |
Overall |
Operationalising data management
What measures are needed to implement and operationalise data management throughout the research lifecycle?
|
- Do you need extra time and resources to implement data management throughout your research, e.g. regular team meetings, setting up a collaborative research environment?
- If staff training is required - higher cost
- Do you need a dedicated data manager?
|
Data manager at level 2* salary |