Scientific studies are using an increasing number of workflows that require the use of methods and tools from different life science domains. Bringing such a workflow to the EOSC environment allows it to be collaboratively developed, used, re-used, and combined. WP2 in EOSC-Life addressed these challenges by making these tools and workflows Findable, Accessible, Interoperable and Reusable (FAIR) in the EOSC, so that life science research infrastructure (LS RI) users and the broad scientific community could take full advantage of the EOSC by accessing a common set of resources and environment.
To achieve this objective, EOSC-Life needed to create suitable workflow metadata standards, a workflow registry to support findability, support the adoption and use of workflow testing to make maintenance easier, and to encourage the adoption of cloud technologies. These efforts enabled the workflows to be implemented on EOSC-compatible cloud infrastructures.
It is noteworthy that the WP2 roadmap is finding adoption beyond the life sciences, for example,
through EOSC-Life collaboration with the EOSC-Nordic project on the Climate Science Workbench used to
build climate models.
This EOSC-Life WP resulted in three Deliverables:
and numerous publications.
Several other EOSC-Life WPs contributed to achieving the objectives related to FAIR tools and workflows:
Early on in the EOSC-Life project, reviews of online materials and publications related to LS-RI activities, as well as informal discussions with individual researchers in some of the RIs at the project kick-off meeting, enabled a range of tools and workflow systems in common use to be identified. This list was subsequently complemented by a survey of the EOSC-Life science Demonstrators.
Based on this information, WP2 developed a technical roadmap that highlights technologies and standards that could be readily supported in the project. These included the Linux operating system, the Conda package manager, Singularity (and/or Docker) for containerisation, CWL for describing data analysis workflows, Nextflow for running workflows on the command line, and the Galaxy platform as a web-based UI for building and running data analysis workflows. The interest in using RStudio and Jupyter notebooks also grew steadily during the project.
The Common Workflow Language (CWL) was selected in EOSC-Life as the standard for describing tools and workflows that can be executed by multiple workflow engines such as Nextflow and Snakemake. ELIXIR has invested in the support of CWL. CWL is also used by the EU’s BioExcel Centre of Excellence for Computational Biomolecular Research, and by the IBISBA ESFRI for Industrial Biotechnology.
To make computational workflows FAIR, EOSC-Life created the WorkflowHub, a registry for describing, sharing and publishing computational workflows. By mid-project, 250 workflows had already been registered in the WorkflowHub, covering multiple LS RIs involved in the project across more than 100 research groups and projects.
WorkflowHub makes it easier for users to find and re-use of workflows by leveraging a number of
standards. WP2 contributed to several of these, such as Workflow RO-Crate, a standard for packaging
executable workflows together with auxiliary files and enriched metadata, and Workflow Testing
RO-Crate, an extension of Workflow RO-Crate that adds a formalism to specify metadata related
to workflow testing.
WP2 also contributed tools that can be used to work with these standards, such as
ro-crate-py, a Python library and CLI to create and consume RO-Crates, and repo2rocrate, a
tool to generate Workflow (Testing) RO-Crates out of workflow repositories following community
standards.
Notably, the WorkflowHub has been included as part of the European COVID-19 Data Portal, adopted by Australian BioCommons and RIs outside LS RIs.
To facilitate workflow maintenance, and in particular to help with the automation and monitoring of workflow tests, EOSC-Life WP2 developed the LifeMonitor workflow testing service.
In addition to tools that facilitate the automatic execution of tests on continuous integration
services (e.g. GitHub Actions), LifeMonitor integrates with WorkflowHub, includes a web
application reporting on the status of all tested workflows, and can notify people who express interest when workflows show problems.
Interoperability between WorkflowHub and LifeMonitor is ensured by the adoption of Workflow
(Testing) RO-Crate as a common data exchange format.
The WorkflowHub and EOSC-Life Workflow Collaboratory developed a new metadata framework for
workflows.
What will happen to these services and technologies after the EOSC-Life project ends?
Further European-wide developments related to the Galaxy platform will continue under the new
EOSC project EuroScienceGateway, which started in September 2022.
EuroScienceGateway will leverage the Galaxy platform, the Pulsar Network (a wide distributed job execution system applied to scale computing power over heterogeneous resources) and FAIR workflow services pioneered in the EOSC-Life cluster (WorkflowHub, Workflow RO-Crate and metadata standards like schema.org).