Skip to main content
6 minutes reading time (1175 words)

Software and Open Science: knowing the best practices

FOSS-webinar---OpenAIR_20200106-181700_1

Open Science webinars in Greek - v1: On Thursday 06 June, the Greek and Cypriot OpenAIRE NOADs ended their first round of collaborative webinars around Open Science with a presentation on good practices for developing software in the area of open research.

In focus: Open Software has gained importance in the field of Science. The demand for open solutions is being increased as researchers realize the benefits of more contributions and collaborations to software development, especially in fast evolving disciplines such as the Life Sciences. Researchers should be able to have access to code, software, tools and materials in order to provide their knowledge and ideas to the academic and scientific community. The presentation highlighted those areas showcasing their adoption in the ELIXIR activities. ELIXIR is an intergovernmental organization that brings together life science resources from across Europe including databases and software tools.

Open Software use and value added: Guest speaker, Dr Fotis E. Psomopoulos, Principal Investigator C at the Institute of Applied Biosciences (INEB) of the Center for Research and Technology Hellas (CERTH), described ELIXIR's "software development best practices group" effort to formalize the process of software development in accordance to the principles of Open Science. Their work is driven by the admittedly limited number of software engineers occupied in the field and by the advantages tied to the use of the right software (UK Research Software Survey study in 2014). Hence, their aim is to improve the quality and sustainability of software in the Life Sciences. One of the first outcomes of the working group is the set of top 10 metrics for life science software good practices, which were consequently condensed in four best practices, thoroughly explained by the speaker in the rest of his presentation.

As a first good practice, the "software best practices working group" positions the development of a publicly accessible open source code from day one. Fotis presented the main factors that a software developer should not forget in order to make its software available. He then moved to the second good practice of adopting a license and complying with the licenses of third-party dependencies. There are many different forms of available licenses according to the intended use and whether there are other parties involved in the making. Third in line is the definition of clear and transparent contribution, governance and communication processes. It is important for the science community to not be excluded by such process and as a third party to be able to communicate with the contributors, as well as establish how their particular involvement and/or contribution will be acknowledged and attributed. Fourth and last best practice is to ensure easy discovery of research software by providing software metadata via a popular community registry. This is an often-overlooked practice but a key aspect in research software development as it can be a vital step both in re-use of the software and better monitoring of its use by the community.

Fotis finally pointed out that the four good practices are easy to be implemented by researchers and similarly easy to be evaluated by the scientific community. In order to encourage researchers and developers to adopt the 4OSS recommendations and build FAIR (Findable, Accessible, Interoperable and Reusable) software, the best practices group, in partnership with the ELIXIR Training platform, The Carpentries, and other communities, created a collection of training materials (Kuzak et al. 2019), available to the wider Open Science community under a permissive license(4 Simple recommendations for Open Source Software).

Next Steps: The next step is to adopt, promote, and recognize these information standards and best practices. The group will address this by (i) developing comprehensive guidelines for software curation, (ii) through training researchers and developers towards the adoption of software best practices and (iii) improvement of the usability of the ELIXIR Tools Platform products as a proof-of-concept. Additionally, a direct outcome of this group will be a Software Management Plan template, connected to a concise description of the guidelines for open research software; and production of a white paper for the software development management plan for ELIXIR, which can be consequently used to produce training materials.

Discussion: The webinar was followed by questions and answers around the following topics

1. Standards in the field of Bioinformatics:

There are general standards for coding as well as standards for software development. The latter are sometime supported by specific tools; however these tools are usually quite strict and act more as a guideline rather than as a benchmarking tool. Some of these practices are directly applicable in research software development in Bioinformatics. Examples of these standards are variable definition a naming, code structure, test coverage, continuous integration and containerization for dependency resolution. All these are software instructions and not a justification of why the software / tool is developed. Such examples by programming language can be found at:

2. Open source code licenses:

There are many types of licenses that software developers can use. The most commonly used are the MIT license and GNU (General Public License), with the CC-BY or CC BY SA (Creative Commons Attribution License) used in research outputs that are not directly software. Issues to consider when choosing among them are the key differences between them, e.g. for the case of GNU and CC BY: GNU balances use and distribution whereas, on the other hand, CC BY balances Copyright and public domain aspects. Examples of tools for license selection are Choosealicense and License Compatibility Matrix.

3. Software metadata schemas:

There are many schemas for software metadata; the Dublin Core Metadata Initiative, CodeMeta, and Schema.org to name but a few. In ELIXIR, the primary metadata schemas are BioSchema.org and the EDAM Ontology – the latter is a comprehensive ontology of well-established, familiar concepts that are prevalent within bioinformatics and computational biology, including types of data and data identifiers, data formats, operations and topics. Adding metadata describing your software is commonly done via available platforms, that internally use those standards. Bio.tools is the ELIXIR portal to bioinformatics resources aimed to help bioinformaticians and scientists find, understand, compare and select resources, as well as use and connect them in workflows. Bio.tools uses EDAM ontology to assist in building software metadata and the ontology is available for reuse under CC BY SA. Zenodo also provides metadata records for software and code documentation stored in GitHub or other code repositories.

4. Good practices and the EU:

Ultimate goal of the ELIXIR Software Best Practices working group is that the publishers and funding institutions (such as the EC) will adopt the paradigm of the four good practices in software development. If this happens, it will take time until formally enforced as a requirement at both the European and national level, for public and private scientific initiatives to follow.


On behalf of the Greek and Cypriot NOADs 

You may find the slides and recordings of the webinar here.
×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.