Interview with Prof. Lyubomir Penev, Managing Director and Founder, Pensoft Publishers
Lyubomir Penev is professor in ecology at the Bulgarian Academy of Sciences, founder and CEO of Pensoft Publishers.

Q: What is a “data paper”?

A datapaper is a scholarly journal article that describes data sets or groups of datasets, through the so-called “extended metadata descriptions”. Data papers are not supposed to present scientific results derived from the data, though they may contain some analyses and simple statistics (e.g., distribution of records by taxa or regions). Data papers ensure a permanent scientific record and a proper citation mechanism to credit the efforts of people collecting and managing data, as well as their supporting institutions.

Q: Currently, Pensoft offers the opportunity to publish Data Papers describing species occurrence data and taxonomic checklists, Barcode-of-Life genome data and biodiversity-related software tools, such as interactive keys and others. How different is the submission of a data paper from traditional paper submission?

The submission process of data paper manuscripts does not differ much from that of regular papers. The process of generating such manuscripts differs, though. For example, data paper manuscripts describing species occurrence data, published through GBIF’s Integrated Publishing Toolkit (IPT), can be generated automatically from the extended metadata descriptions and then submitted to the publisher in the form of a RTF or PDF file. In other cases, the data papers manuscripts can be written in a text processor (MS Word, Open Office, etc.), just as any other manuscript, but they should follow specific formats, recently described in a paper by Chavan and Penev (2011) and also in Pensoft’s Data Publishing Guidelines. Some examples of data papers published in ZooKeys and Phytokeys are literature survey data on the birds of India, bottom trawl data survey around Taiwan, the Belgian Florabank1 database of vascular plants and also the description of MOSCHweb, a new online identification key platform.

Within the FP7 project ViBRANT, Pensoft will soon be launching a collaborative article authoring tool called Pensoft Writing Tool (PWT) that provides templates for different types of data papers. Within the PWT environment, data paper and other manuscripts will be validated and automatically submitted to data journals. The PWT will also provide the technical infrastructure for the forthcoming Biodiversity Data Journal, a truly next-generation data-publishing platform, expected to be open for submissions by the end of the year.

Q: How do you handle the diversity of data types, formats, technical standards and so on?

This is probably the main challenge in data publishing. Fortunately, we already have a few established standards and related infrastructures for some key types of data, e.g., taxon occurrences and checklists (Darwin Core Archive format available through the GBIF IPT), genomic data (GenBank, Barcode-of-Life), or even phylogenetic data (TreeBASE). However, there is still a long way to go towards establishing community standards for the great variety of other types of data.

Q: How is your peer review process organized?

The review process is essentially the same as for other types of manuscripts submitted to our journals. The links to datasets should be made available in the data paper manuscripts and the reviewers are expected to evaluate not only how the data are described in the manuscript but also how they are technically presented.

Q: How do you cooperate in data hosting and developing of data publishing workflows with the data repositories?

We have integrated our publishing workflow with the GBIF Integrated Publishing Toolkit (IPT) and the Dryad Data Repository. We cooperate also with other repositories, for example the Consortium for Barcode o fLife (CBOL) (example), GenBank, Pangea and others.

Q: How are the data licensed?

We recommend using the least restrictive licenses in data publishing, such as the Open Data Commons Attribution License, Creative Commons CC-Zero Waiver Commons CC-Zero Waiver, or Open Data Commons Public Domain Dedication and Licence. Applying the so called non-commercial (NC) licenses often severely restricts the re-use of data, as it has been shown by a recent study, published in one of our journals. It is really important to publish the data under an open license - otherwise, the possibility to re-use of the data is seriously compromised.

Q: How do you think a data paper journal should measure its success?

The answer to this question largely depends on what do you mean by “success”. Certainly, if a journal publishes an increasing number of data papers that are being cited and the data described therein are being re-used, this could be termed a “success”. I would like to stress, however, that the benefits from such success will surely go not only to the journal, but also to the authors, data managers, data hosting institutions and society in general.

