Get in touch with EOSC-Life

     WP1 RI nominated resources 3: PombeMine: A FAIR workflow-enabled PomBase in the cloud


    We have successfully built a new PombeMine database containing many of the major datasets
    from PomBase. This includes:
    ● Sequence and features
    ● Gene ontology, annotation extensions
    ● Phenotypes, alleles, conditions, penetrance
    ● Protein families and domains
    ● Orthologs
    ● Disease associations
    ● Genetic and physical interactions

    A set of predefined “template” searches with editable default parameters have been created to allow researchers to easily access these data; templates can be filtered by categories such as disease, phenotype, and interaction.

    A number of new features have been added to the new InterMine user interface to improve the user experience, and a number of bugs in the new Blue Genes interface were fixed.

    The new instance was built and deployed on the cloud environment provided by CSC and is available here. We used the Rahti service to build and deploy the new instance; the service allows us to run InterMine software, packaged in Docker containers, on a shared computing platform. We also used the Allas object storage service to store the dataset provided by PomBase.

    The DataDownloader script has been updated to automate the download of data from the PomBase source.


    InterMine Team (Department of Genetics, University of Cambridge)

    PomBase (Department of Biochemistry, University of Cambridge)


    RIs involved

    InterMine is a Recommended Interoperability Resource of ELIXIR. This project has allowed us to test our new InterMine cloud Software-as-a-Service (SaaS) model.



    For PomBase users, PombeMine will provide novel data query options, and make some
    commonly used analysis tools more immediately accessible:
    PomBase cuators have already used PombeMine queries in a number of QC workflows to
    access query outputs that are not possible in the PomBase Advanced query interface.
    1. Identification of malformed allele names (for example, to identify missing or non-standard allele names).
    2. Alignment of the manually curated disease gene list with OMIM. This work also identified a large number of causal disease -> gene connections missing from Monarch and 11 diseases missing from the MONDO ontology.

    These 11 terms are under review for addition to MONDO (see here). The API will provide computational access to fission yeast data. This has been used, for example, in the workflow to populate Genestorian with fission yeast allele data (see here).

    Rapid access to analysis tools to lists. The ability to send any user-defined list of interest directly to PombeMine from PomBase, and then to immediately view enrichments to determine both gene ontology and phenotypes for the list, will remove substantial barriers to performing enrichment analyses. Enrichments can be time consuming to perform, and external resources frequently use out-of-date GO data and ontologies (we plan to update PombeMine monthly).
    The ability to create sub-lists using a selection of filters based on orthogonal data types (disease, phenotypes, alleles) and to see the resulting enrichments on-the-fly brings a novel and useful functionality that is not usually available in other popular enrichment tools. Similarly, the ability to obtain genetic and physical interaction network views for any list imported or generated by performing queries will remove technical barriers and time constraints for bench biologists.