Metagenomics studies routinely perform deep sequencing of entire microbial DNA using various sequencing technologies (short- and long-reads). This can result in the need to scale data analysis over potentially TBs of data. A lack of standardization hinders the interoperability of tools within workflows that would facilitate data analysis. Developing the best analysis workflow is currently the biggest challenge for life scientists using metagenomics.
This demonstrator project aims to democratise the analysis of microbiomes by providing metagenomics researchers with a toolkit for producing and re-using a range of analysis pipelines efficiently deployed on cloud computing infrastructures.
This Demonstrator proposes to standardize and facilitate the analyses of complex biomes, by using standardized interfaces/containers and integrate containerized workflow components (depending on the needs) within workflow descriptions (e.g.CWL). More specifically, they are combining EMGB (Elastic MetaGenome Browser) pipelines with CAMI’s benchmarking datasets and containers to produce biome and sequencing technology specific workflows, based on the best combination of tools. This process of combining pre-existing workflow components within a final workflow allows the rapid development of new pipelines and extension or modification of existing ones, depending on the needs and the microbiome complexity.
The CWL-based EMGB metagenome pipeline developed at Bielefeld University has been successfully deployed on two different de.NBI Cloud sites (Bielefeld and Gießen).
In a parallel activity, The EMBL-EBI team has been working on the development of a new version of Mgnify, that is a hub of data used for assembly analysis and archiving of microbiome data. This new version of Mgnify (v5.0) aims to increase reproducibility of the pipelines, the interoperability and the provenance of metagenomics resources and was successfully deployed on oracle cloud, embassy cloud and GCP.