News
Interview with The Finnish Social Data Archive (FSD)
Apr 9, 2018

Just what is involved in the day to day running of a data centre? Finland's Social Science Data Archive is a major resource centre for social science research and teaching. In this interview with two members of staff we find out more about how they support their users, tackle licensing and interoperate with other infrastructures.
What is the Finnish Social Science Data Archive?
The Finnish Social Science Data Archive (FSD) commenced in 1999 as a national resource center for social science research and teaching. FSD archives, promotes and disseminates digital research data for research, teaching and learning purposes. The archive is funded by the Ministry of Education and Culture and is a separate unit of the University of Tampere.
The Finnish Social Science Data Archive (FSD) commenced in 1999 as a national resource center for social science research and teaching. FSD archives, promotes and disseminates digital research data for research, teaching and learning purposes. The archive is funded by the Ministry of Education and Culture and is a separate unit of the University of Tampere.
What are the main objectives with the Finnish Social Science Data Archive?
The main objective is to promote open access to research data, and the transparency, accumulation and efficient reuse of scientific research. In addition to survey data FSD archives qualitative data.
Who are your main users?
The biggest customer group (60%) consists of students and method teachers. Researchers are the other main customer group. The data archive now houses 1300 datasets and delivers annually 500+ datasets to users. The use of our services is increasing and we predict that this year the number of delivered datasets will be over 600.
The biggest customer group (60%) consists of students and method teachers. Researchers are the other main customer group. The data archive now houses 1300 datasets and delivers annually 500+ datasets to users. The use of our services is increasing and we predict that this year the number of delivered datasets will be over 600.
Can you pinpoint some of the main challenges with data in Social Sciences?
The main challenge in social sciences is the same as in other branches of science: to persuade researchers to share their data, to deposit the data to the data archive to be delivered for further use. Researchers tend to have concerns about inadvertent misuse of data, or they might have used unsuitable informed consents or there might be other confidentiality problems with the data. The non-open attitudes of research data reside with the research culture which is quite hard to change.
Another issue is the amount of hard work that goes to collecting the data and to transforming the raw data to analyzable data. Because of this the researchers have trouble accepting that their carefully planned and gathered research data would not be exclusively theirs. Researchers also tend to fear that archiving data (especially gathering all the metadata) requires a lot of extra work.
How do FSD approach these challenges?
We try to influence the research culture; we for example give presentations about sharing data, participate in national and international projects and efforts that emphasize and enable sharing and preserving research data, and provide information on our website about the advantages of sharing data and data management in general. We also try to make archiving data as easy as possible for the researcher, for example we help with the metadata and anonymization of data, as well as give advice on ethical issues.
How do you contact researchers and encourage them to deposit and share data?
How do you contact researchers and encourage them to deposit and share data?
We check out scientific journals and research announcements and add into our database basic information of each dataset we think is “potential for archiving”. Then we email the principle researchers, trying to get them to deposit their data. Only about every tenth of those contacted archive their data.
Recently we have also started to send congratulation letters for recipients of research funding, reminding them about the importance of appropriate data management planning, and our services. We are optimistic of this procedure, since the most essential decisions are made – or left unmade – at the conception stage of the data.
Anyone working in a scientific field is glad to see scientific progress, but it seems that open access of research data as a general principle does not work if there is no personal accolade connected to it. Thus we think that the most fruitful way to promote open access of research data is to recognize data archiving as a scientific merit like publication.
Recently we have also started to send congratulation letters for recipients of research funding, reminding them about the importance of appropriate data management planning, and our services. We are optimistic of this procedure, since the most essential decisions are made – or left unmade – at the conception stage of the data.
Anyone working in a scientific field is glad to see scientific progress, but it seems that open access of research data as a general principle does not work if there is no personal accolade connected to it. Thus we think that the most fruitful way to promote open access of research data is to recognize data archiving as a scientific merit like publication.
How do you provide user support and training?
We give presentations and courses concerning data management, data protection, research ethics and relevant Finnish legislation. Our web site offers also a lot of information on the data management. Most of it is in Finnish, but we have the basic contents also in English: Data Management Planning (http://www.fsd.uta.fi/en/informing_guidelines/index.html), Informing Research Participants (http://www.fsd.uta.fi/en/informing_guidelines/index.html) and Anonymisation and Identifiers (http://www.fsd.uta.fi/en/anonymisation/index.html).
Why is it important to promote Open Access to data?
Why is it important to promote Open Access to data?
The objective and critical nature of science requires that the research results from an empirical data can be checked if necessary. Replicating the research in exactly the same way and in similar circumstances however is usually not possible. Even still, the openness of research data improves the quality of scientific research. The simple knowledge that the results may be double-checked from the empirical data later on forces the researchers to be systematic and observant in the analysis of the data.
Doing research is not private business, it is and it has to be collective and shared. You can never know how your research data can be used in the future. It would be just waste of time, money and intellectual resources to make a dataset for only to be used once. Data archiving is thus against single-use conspicuous consumption of research resources.
Science is global - but how does FSD connect to a global research infrastructure?
FSD is part of the global data archiving community. Our aim is to ensure that Finnish datasets can be found and used regardless of the location of the re-user. For instance, we use an international metadata format called DDI, and our metadata and survey datasets are translated into English. Annually about 10-20% of FSD’s customers are from abroad.
FSD is a member of CESSDA (Council of European Social Science Data Archives) that has a common European data portal (http://www.cessda.org) that provides a seamless interface to datasets from social science data archives across Europe, including FSD. CESSDA is currently shifting into a new organization, CESSDA-ERIC (European Research Infrastructure Consortium), of which we will be a part of.
We are constantly engaged in active co-operation with other data archives, especially in Europe but also worldwide. At the moment we are participating in two EU-funded projects: Data without Boundaries (DwB, http://www.dwbproject.org/) and Digital Services Infrastructure for Social Sciences and Humanities (DASISH, http://dasish.eu/).
How are the data licensed?
FSD disseminates data for research and teaching, free of charge. The archived datasets can be distributed to FSD’s registered users who agree to comply with the terms and conditions set out for the use of the data. Most of the depositors let their data to be used for both research and education, but some researchers limit the re-use only for research purposes. The smallest minority are depositors who want to consider each request before giving a permit for data deliver. (For more information, please see http://www.fsd.uta.fi/en/organisation/principles.html)
All metadata are openly available on our website in DDI-Codebook XML format.
How do you handle the diversity of data types, formats, technical standards and so on?
Digital data needs constant caretaking, and all digital preservation systems must anticipate failures and obsolescence. Digital preservation of research data is a process that requires carefully thought out policies and procedures and the use of the most suitable technology. Our experts are constantly following the developments and progress in the data archiving world, and we also participate in developing data processing and preserving practices at both national and international level.
Our most important tool is our Archive Formation Plan that includes and describes our selection criteria, preservation policy, preservation plan, file plan, and records management. The Archive Formation Plan is updated annually.
We use international and national standards and best practices whenever possible. For example, metadata is stored in DDI Codebook XML format, and we have compared our operations with the OAIS standard and TRE Checklist in order to examine the trustworthiness of our operations and the functioning of our processes, and to improve the prerequisites for national and international cooperation. We also follow the national best practices set out by the National Digital Library, a project which aims to ensure that electronic materials of Finnish culture and science are managed with a high standard, are easily accessed and securely preserved well into the future.
Doing research is not private business, it is and it has to be collective and shared. You can never know how your research data can be used in the future. It would be just waste of time, money and intellectual resources to make a dataset for only to be used once. Data archiving is thus against single-use conspicuous consumption of research resources.
Science is global - but how does FSD connect to a global research infrastructure?
FSD is part of the global data archiving community. Our aim is to ensure that Finnish datasets can be found and used regardless of the location of the re-user. For instance, we use an international metadata format called DDI, and our metadata and survey datasets are translated into English. Annually about 10-20% of FSD’s customers are from abroad.
FSD is a member of CESSDA (Council of European Social Science Data Archives) that has a common European data portal (http://www.cessda.org) that provides a seamless interface to datasets from social science data archives across Europe, including FSD. CESSDA is currently shifting into a new organization, CESSDA-ERIC (European Research Infrastructure Consortium), of which we will be a part of.
We are constantly engaged in active co-operation with other data archives, especially in Europe but also worldwide. At the moment we are participating in two EU-funded projects: Data without Boundaries (DwB, http://www.dwbproject.org/) and Digital Services Infrastructure for Social Sciences and Humanities (DASISH, http://dasish.eu/).
How are the data licensed?
FSD disseminates data for research and teaching, free of charge. The archived datasets can be distributed to FSD’s registered users who agree to comply with the terms and conditions set out for the use of the data. Most of the depositors let their data to be used for both research and education, but some researchers limit the re-use only for research purposes. The smallest minority are depositors who want to consider each request before giving a permit for data deliver. (For more information, please see http://www.fsd.uta.fi/en/organisation/principles.html)
All metadata are openly available on our website in DDI-Codebook XML format.
How do you handle the diversity of data types, formats, technical standards and so on?
Digital data needs constant caretaking, and all digital preservation systems must anticipate failures and obsolescence. Digital preservation of research data is a process that requires carefully thought out policies and procedures and the use of the most suitable technology. Our experts are constantly following the developments and progress in the data archiving world, and we also participate in developing data processing and preserving practices at both national and international level.
Our most important tool is our Archive Formation Plan that includes and describes our selection criteria, preservation policy, preservation plan, file plan, and records management. The Archive Formation Plan is updated annually.
We use international and national standards and best practices whenever possible. For example, metadata is stored in DDI Codebook XML format, and we have compared our operations with the OAIS standard and TRE Checklist in order to examine the trustworthiness of our operations and the functioning of our processes, and to improve the prerequisites for national and international cooperation. We also follow the national best practices set out by the National Digital Library, a project which aims to ensure that electronic materials of Finnish culture and science are managed with a high standard, are easily accessed and securely preserved well into the future.
What is the relation between FSD and journal articles?
For each archived dataset, we collect information about articles and other publications where the dataset has been used. We do not have any direct working relationship with journals.
How do you measure the success of a data archive?
Our goal is to ensure the safety and security of digital research data in the long term, and only future can tell if the decisions we make today will work successfully! Meanwhile, there are a number of indicators we follow on a regular basis, the most important ones being the usage of archived data (both the number of data delivered and the number of unique users), the number of deposited data, the number of publications based on archived data, and the number of webpage hits. The number of confidential posts our personnel have in national and international organizations tells us about the expertise we hold. Naturally we also follow our financial numbers, for example the share of project funding annually.
In addition, we use metrics like the Standard for Trusted Digital Repositories (ISO 16363) and the Data Seal of Approval (DSA) to measure the trustworthiness of our digital preservation, compare our numbers to other CESSDA archives' numbers and conduct user surveys to collect information on researchers' views.
Finally could you say a bit about how you see the future for data driven science?
In a way, social sciences have been data driven for decades. Social scientists began to use more and more computerized data in the late 1950s and early 1960s. These created a need to store and manage the data and lead to the establishment of social science data archives, the very first institutes to preserve digital material. From the beginning, the possibility to share and re-use data and to verify results were important motives. Of course, the volume of data today is massive compared to 1950s, and as the volume of data increases, managing, storing and preserving the data as well as finding suitable data becomes more difficult. Also the scholarly communication process and the academic reward system are facing changes. The data archives tackle all these problems – and many more – so that researchers don't have to.
Our goal is to ensure the safety and security of digital research data in the long term, and only future can tell if the decisions we make today will work successfully! Meanwhile, there are a number of indicators we follow on a regular basis, the most important ones being the usage of archived data (both the number of data delivered and the number of unique users), the number of deposited data, the number of publications based on archived data, and the number of webpage hits. The number of confidential posts our personnel have in national and international organizations tells us about the expertise we hold. Naturally we also follow our financial numbers, for example the share of project funding annually.
In addition, we use metrics like the Standard for Trusted Digital Repositories (ISO 16363) and the Data Seal of Approval (DSA) to measure the trustworthiness of our digital preservation, compare our numbers to other CESSDA archives' numbers and conduct user surveys to collect information on researchers' views.
Finally could you say a bit about how you see the future for data driven science?
In a way, social sciences have been data driven for decades. Social scientists began to use more and more computerized data in the late 1950s and early 1960s. These created a need to store and manage the data and lead to the establishment of social science data archives, the very first institutes to preserve digital material. From the beginning, the possibility to share and re-use data and to verify results were important motives. Of course, the volume of data today is massive compared to 1950s, and as the volume of data increases, managing, storing and preserving the data as well as finding suitable data becomes more difficult. Also the scholarly communication process and the academic reward system are facing changes. The data archives tackle all these problems – and many more – so that researchers don't have to.
Questions by Mikael Karstensen Elbæk 19.10.2012
Answers by Arja Kuula and Mari Kleemola 23.10.2012 (arja.kuula@uta.fi" data-mce-href="mailto:arja.kuula@uta.fi">arja.kuula@uta.fi, mari.kleemola@uta.fi" data-mce-href="mailto:mari.kleemola@uta.fi">mari.kleemola@uta.fi)
Answers by Arja Kuula and Mari Kleemola 23.10.2012 (arja.kuula@uta.fi" data-mce-href="mailto:arja.kuula@uta.fi">arja.kuula@uta.fi, mari.kleemola@uta.fi" data-mce-href="mailto:mari.kleemola@uta.fi">mari.kleemola@uta.fi)