This EOSC-Life demonstrator project aims to develop an online marine genomic resource to aid community driven annotation of marine eukaryotes and help to provide a focus for post-assembly genomic workflows and data access of specific (closely related) groups of marine organisms. The more closely two taxa are evolutionarily related, the more they are expected to share genomic structure and synteny, and it is therefore of benefit to compare and contrast genome annotations between closely related organisms. This is especially important for communities that work on specific genomes and provide manual annotations via community platforms (e.g. Orcae). This demonstrator project addresses the lack of tools to compare and transfer annotations and features between the sequenced genomes of closely related species.
A software tool for comparison of genome annotations stored in General Feature Format (GFF) has been designed, implemented, and integrated into the Galaxy platform. In addition a Snakemake workflow has been designed and implemented for cloud deployment and the tool source code and Docker module will soon be made available from a Github repository . The tool is currently being used to develop a public-facing portal for the community annotation of species belonging to the pelagic herring fish family, Clupideae. The genomes of eight clupideae will be available for comparison initially, but the tool could also be used to compare any two genome annotation libraries. It is also intended that the tool will be able to automate the updating of annotations to the community annotation platform, Orcae, through the platform API. FAIR data and privacy issues surrounding data usage have been addressed.
It is especially important to emphasize FAIR data. This is something I and colleagues are familiar with, since we deal mainly with DNA sequence data that are uploaded to international, freely accessible databases (INSDC). However, it is equally important to better understand the principles of Open Science and the crucial part that standards, as well as vocabularies and rich metadata, play in making data machine readable and interoperable with other data sources. This helps to make the data maximally useful to the research community.
– Cymon J. Cox