An increasing number of studies employ workflows involving methods and tools from different domains of the life sciences. A high-level view of such workflows and corresponding publications is shown in figure 1. Bringing such a workflow to the EOSC environment, where they can be collaboratively developed, used, re-used and combined, presents numerous challenges. WP2 in EOSC-Life has the objective to facilitate the porting of scientific workflows by the RIs to the EOSC environment and help make them FAIR.
The plan to achieve this objective, which is better described in the “Current WP2 roadmap” section, has many facets, spanning from the creation of suitable workflow metadata standards, to a workflow registry to support findability, to supporting the adoption and use of workflow testing for easier maintenance, to helping in the adoption of cloud technologies to enable the execution of these workflows on EOSC-compatible cloud infrastructures.
Figure 1. Examples of workflows from publications in which the workflow is composed of tools from domains covered by different LS RIs.
For our purpose, the following definitions apply:
In WP2, workflows are specified in terms of the flow of data between a set of tools. We note that tools are not limited to the implementation of an atomic task but can also implement a workflow.
For an introduction to the value of workflow management systems, see https://www.nature.com/articles/d41586-019-02619-z 7.
WP2 focuses on the implementation, sharing and maintenance of workflows — e.g., tool packaging, containerisation, workflow management systems, etc. On the other hand, challenges in the area of provisioning and integrating cloud infrastructures are out of its remit. Cloud deployment is done in cooperation with EOSC-Life WP7.
To maximise the use of WP2 resources and promote interoperability, WP2 will focus on a limited number of components and build upon resources already available.
To promote findability and reusability, WP2 will unify tool and workflow descriptions using structured data, provide a workflow registry that leverages current resources, and create a specific service to support workflow maintenance through automated analysis and testing.
Reviews of online materials and publications related to the activities of the LS RIs as well as informal discussions with individual researchers within some of the RIs (including during the project kick-off meeting) identified a range of tools and workflow systems in common use. This was complemented by a survey of the EOSC-Life science demonstrators. Based on this, WP2 has developed an initial technical roadmap that highlights technologies and standards that can be readily supported within the project. The technologies and standards include the Linux operating system, the Conda package manager, Singularity (and/or Docker) for containerisation, CWL for describing data analysis workflows, Nextflow for running workflows on the command line and the Galaxy platform as web-based UI for building and running data analysis workflows. In addition, there is growing interest in the use of RStudio and Jupyter notebooks. To build on existing efforts and expertise, WP2 will aim at using these tools or ensuring compatibility with them.
The Common Workflow Language (CWL) is selected as the standard for describing tools and workflows that can be executed by multiple workflow engines such as Nextflow and Snakemake. ELIXIR has invested in the support of CWL. CWL is also used by the EU’s BioExcel2 Centre of Excellence for Biomolecular modelling, and by the IBISBA ESFRI for Industrial Biotechnology. CWL is participating in GA4GH Task Execution API 1 (a minimal common API for submitting a single job to a remote execution endpoint) and GA4GH Workflow Execution API (a minimal common API for submitting workflow requests to workflow execution systems in a standardized way).
EOSC Life aims to provide an environment to support a wide range of Workflow Management Systems available to its RI developers and users.
Some workflow systems have been identified as meriting dedicated attention.
In May 2021 LifeMonitor had its first public release and has since continuously progressed in functionality. It now offers a web application with a dashboard for visual workflow monitoring and management, which allows users to keep track of the test status of multiple workflows at a glance. The dashboard provides visual feedback on the outcome and duration of the more recent test executions as well as links to the relevant pages in the CI servers where further details can be found.
To exchange information on workflow tests, LifeMonitor uses Workflow Testing RO-Crate, an extension of Workflow RO-Crate, thus ensuring interoperability with WorkflowHub. WorkflowHub is also available as an identity provider for authentication, allowing WorkflowHub users to access the service without the need to create a new account. Authentication via GitHub is also supported and authentication via the Life Science Login service will be supported in the near future.
Currently, the LM service is used to monitor workflows in the highly curated IWC collection and a set of workflows created by BBMRI-ERIC for processing digital pathology images in the ColoRectal Cancer Cohort.