Get in touch with EOSC-Life

    Demonstrator 3: Rapid, scalable and reproducible deployment of (meta-)genomics assembly and analysis pipelines tailored to the biome of interest

    Summary

    Metagenomics studies routinely perform deep sequencing of entire microbial DNA using various sequencing technologies (short- and long-reads). This can result in the need to scale data analysis over potentially TBs of data. A lack of standardization hinders the interoperability of tools within workflows that would facilitate data analysis. Developing the best analysis workflow is currently the biggest challenge for life scientists using metagenomics. 

    This demonstrator project aims to democratise the analysis of microbiomes by providing metagenomics researchers with a toolkit for producing and re-using a range of analysis pipelines efficiently deployed on cloud computing infrastructures.

    This Demonstrator proposes to standardize and facilitate the analyses of complex biomes, by using standardized interfaces/containers and integrate containerized workflow components (depending on the needs) within workflow descriptions (e.g.CWL). More specifically, they are combining EMGB (Elastic MetaGenome Browser) pipelines with CAMI’s benchmarking datasets and containers to produce biome and sequencing technology specific workflows, based on the best combination of tools. This process of combining pre-existing workflow components within a final workflow allows the rapid development of new pipelines and extension or modification of existing ones, depending on the needs and the microbiome complexity. 

    The CWL-based EMGB metagenome pipeline developed at Bielefeld University has been successfully deployed on two different de.NBI Cloud sites (Bielefeld and Gießen).

    In a parallel activity, The EMBL-EBI team has been working on the development of a new version of Mgnify, that is a hub of data used for assembly analysis and archiving of microbiome data. This new version of Mgnify (v5.0) aims to increase reproducibility of the pipelines, the interoperability and the provenance of metagenomics resources and was successfully deployed on oracle cloud, embassy cloud and GCP.

    http://toil.ucsc-cgl.org/

    https://emg-docs.readthedocs.io/en/latest/analysis.html

    https://www.ebi.ac.uk/metagenomics/

    Participants

    EMBL-EBI

    Uni Bielefeld/DeNBI

     

    RIs involved

    ELIXIR

     

    Publications

    • Almeida, A., Nayfach, S., Boland, M. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39, 105–114 (2021). https://doi.org/10.1038/s41587-020-0603-3
    • Alex L Mitchell, Alexandre Almeida, Martin Beracochea, Miguel Boland, Josephine Burgin, Guy Cochrane, Michael R Crusoe, Varsha Kale, Simon C Potter, Lorna J Richardson, Ekaterina Sakharova, Maxim Scheremetjew, Anton Korobeynikov, Alex Shlemov, Olga Kunyavskaya, Alla Lapidus, Robert D Finn, MGnify: the microbiome analysis resource in 2020Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D570–D578, https://doi.org/10.1093/nar/gkz1035