Research Data Management in Finland: an Interview
The National Research Data project (TTA) is under way, thanks to a coordination group and various work groups. The aim is to develop the structures and solutions to serve the research community in research information management.
The targets of TTA are:
- Development of research data process: building a culture
- Enhancing interoperability in research sector
- Facilitating availability and reuse of research data
- Advancing research data preservation
- Facilitating data storage, interoperability (metadata and interfaces) and cataloging
A crucial issue here is to make research data available and safe. This involves planning how and what data to preserve and how and what kind of metadata hould be attached to data to facilitate information retrieval. The cooperative information infrastructure requires architectural planning, as the the long term preservation. CSC’s role is to be actively involved in the work groups and to act as service provider. The services include safe data storage service (IDA) and metadata catalog (KATA(, and in long term, data preservation service (PAS).
TTA is a matter of establishing a national ecosystem, a model of networked cooperation. There are so many actors and structural components that we can no longer talk about just one service.
The information ecosystem is being built to facilitate the use, re-use and storage of data. There various servicesare needes for analysis, distribution, sharing and secure storage of data. Building such an ensemble serve the interests of researchers, the scientific community and general public.
The expected benefits are:
- All national research data from one point: easy to find, easy to use
- Common practises for data management
- Interoperability: metadata, interfaces
- Versatile service collection
- Preservation of relevant information
Interoperability and metadata
The Information Infrastructure Work Group in the TTA project works to define the information architecture and considers such matters as cooperation structures and interoperability. Cooperation structures are relevant in defining and designing higher education needs. The stakeholders include univerisites, polytechins, research projects and global actors in the field of science, as well as the public sector in Finland. The Work Group’s report will be ready early next year.
The group has a big enough task finding solutions to the challenges of integration. The increase in the volume of data challenges traditional data transmission methods and causes management problems. The future information infrastructure should support data re-use so that it is accessible, comprehensible, machine-readable and easily reusable. Another work group is ponderingthe unresolved questions of metadata. Answers are expected in context ofthe sufficiency and quality of the descriptive information and the extent to which different practices and standards can be coordinated.
The Metadata Work Group is also making preparations for metadata catalog service starting next year. The Metadata Work Group investigates the semantical interoperability of data and definesstructures for the data. The Group recommends and implements models for managing data. Commensurate metadata reinforces the interoperability of data content, so the Group also produces guidelines on what sort of descriptive information should go with the data and the kind of interfaces that should exist. The metadata catalog (KATA) will facilitate data access and findability and, for example, to promote interface solutions. KATA can therefore enhance understanding of what data sets are about.
Already this year, while the project has been up and running, we have seen a big change in attitude in the open access to information. Researchers have begun to see the benefits of openness. As the tools for data mining and analysis improve constantly, raw data becomes ever more readily available. It is easier to show the dataon what basis conclusions have been drawn. In this way the quality of research can be improved.
IDA safeguards the integrity of data
The IDA storage service is one of the first milestonesof the TTA project. It has been running since September, and is a secure and user-friendly data and metadata storage service. It operates by means of a browser. The IDA service provider is CSC.
The IDA service has been piloted since the start of the year, and now colleges and universities are starting to use it. Researchers log into the service via the Haka Circle of Trust Network, using their organisations’ own user authentication. At present they are joining the service as members of various projects.
After the higher education institutes, it will be the turn of research projects financed by the Academy of Finland. The storage capacity for IDA is at present 1 petabyte for universities and similar amount for the Academy. As volumes of data grow, the service will be scaled up accordingly and be made suitable for storing tremendously large volumes of data.
IDA’s storage mechanism is now ready. In the future the system will be developed and the experiences of users are expected to aid that development. The KATA data catalog, which will be ready from the start of 2013, is the next service to be linked to IDA and visible to the user. At the moment, the metadata in IDA is linked to the data, but in the future KATA will make it easier to carry out searches and use the data more diversely.
The fundamental aim of the TTA project is to promote the long-term preservation of data. Long-term in this context means a time period that is longer than the lifetime of the people, applications or platforms that originally created the information – generally around 50 years.
IDA has been created to safeguard the integrity of data, and it contains components for copying and checking integrity, anti-virus software and a regular data refresh function. It is possible to use IDA as one component en route to long-term preservation.
According to a recent decision, designing a long-term preservation solution is to be a part of the TTA project, and so plans will start for the service architecture as well as the technology involved. The service will be conducted in the National Digital Library (KDK) project.
Data for Research (TTA)
The TTA project is intended to promote the standardisation of descriptions of data, their preservation and their use. In future, interoperability will be established at national, European and international level.
The TTA project will support:
- a process map of data for research
- the information infrastructure service architecture
- a metadata model for the management of research data
- a shared metadata catalog/search service for research data (KATA)
- a storage service for data for research (IDA)
- preparations for a joint long-term preservation solution (TTA-PAS)
This articles was possible with kind contribution from Pirjo-Leena Forsström from CSC.
National Research Data Project website http://www.csc.fi/tta
Juha Haataja, Counsellor of Education, Ministry of Education and Culture,
Pirjo-Leena Forsström, Secretary General of the National Research Data Project, CSC,
- Last updated on .