The goal of this science demonstrator is to extend the reporting capabilities developed during the EOSC Pilot that allowed sharing the image processing workflow performed during the acquisition phase of any structural study performed at the electron microscope (cryoEM images). Our previous project devised a mechanism to store and inspect the image processing steps that a particular set of cryoEM movies had undergone at the acquisition site. This mechanism was particularized to the Scipion workflow engine that is used in many cryoEM facilities. In this new project, we will extend this reporting mechanism to all the image processing steps and will incorporate FAIR principles in cryoEM data and their associated processing.

EMDB is a public database that preserves the last product of the acquisition and data analysis. EMPIAR is another public database that preserves some of the original data associated to the EMDB entries. Our goal here is to provide within Instruct an open science technology that allows preserving the original data and its full processing for all structural studies performed by electron microscopy. This technology has to be complemented with an open science policy and hardware support that are the object of other European projects.

The following workflow is a possible vision of how data acquisition and its FAIRification would work:

  1. The user acquires image data at an electron microscopy center associated to Instruct. During the acquisition, raw data and the preprocessing workflow description (json) are sent to the EMPIAR database. Optionally raw data and Scipion project could be sent to the user’s home institution, otherwise the user will have to take it with her on a portable storage device.
  2. EMPIAR will return a DOI (Digital Object Identifier) that uniquely identifies this acquisition. This DOI is returned to the cryoEM center and attached to the Scipion project associated to the acquisition.
  3. The user continues the image processing in its home institution and, if he continues using Scipion, then there are regular updates of the image processing workflow stored in EMPIAR.
  4. When the user finishes the data analysis, the final result can be submitted to EMDB. If she is still using Scipion, there will be some harvesting tool helping this submission. In this submission, EMDB will ask for the acquisition DOI and the EMDB entry will be updated with the EMPIAR information.
  5. After some embargo time and according to a data management plan of Instruct (whose definition is out of the scope of this project), all acquisitions, published or not, as explained in Step 4, will be made public.

Steps 4 and 5 will require FAIRification guidelines to guarantee that raw data and workflows are findable by search engines.

With this workflow the preservation of the raw data of Instruct projects, their findability and reusability are guaranteed, and the process followed to its final structure is open facilitating its interpretability by understanding the limitations of this final result and the choices undertaken in the intermediate steps.

The CryoEM Demonstrator for EOSC-Life