SP3 has taken a clear modular design approach. Each module has its single and independent responsibility, performing one aspect of the designed functionality.
All modules are working together in three layers, UI, APIs and Cloud. All modules are written in Python3.6+ and prefix with Cat*, with a wish to be agile and flexible.
CatWeb is a web application that calls Cat web API modules for different functionalities. CatGrid and CatCloud provide a unified layer for cloud agnostic compute node scaling and job scheduling.
CatWeb UI allows user to interact with SP3, for fetching data, analysing samples on different pipelines, monitoring the progress and download the result.
Cat Web APIs¶
CatFetch is a web API that downloads data from different data sources, like ENA or other sources.
CatDownload is a web API that provides URL of files to be downloaded.
CatTag is a web API that provides tagging features for runs and samples.
CatStats collects state of the CatGrid, and draws graphs based on the CatGrid state, it serves a single svg graph as a Web API.
Catreport is a service that takes requests for reporting, generating reports and serving them on the web.
For example, a clockwork pipeline has a report generated by Catreport for its output on Kraken2, Mykrobe, Samtool QC and drug resistance prediction.
Catree API is a service that takes requests for building phylogeny trees, using iqtree
Currently, the requests could be made for neighbours of TB samples.
Catpile is a service that allows user to view sp3 metadata at dataset detail page as well as the report of the sample.
When fetch occurs, the API is called to load the sp3 data. When a new run submits, the API links a fetch with a run.
Catdap is a service that manages SP3 user authentication and authorisation.
It allows users register an account with SP3 and they only can access the data of their organisation.
CatPersistence provides a web view of data in SP3 persistence storage.
SP3 persistence storage stores data from different site for long-term storage. Currently, we store run information, pipeline output files and reports.
What output data need to be stored is configurable, for example, we storage consensus fasta files from clockwork TB pipeline. This enables users to query TB neighbourhood and build phylogeny trees.
Cloud Agnostic Operation¶
CatCloud is a python application that performs cloud compute nodes scaling. It manages vms on different cloud infrastructure in a configurable fashion.
CatCloud can work with CatGrid or Slurm Cluster.
Catgrid is a configurationless, agentless grid scheduler with a web api. It manages nodes and jobs. It accepts web API request for nodes adding, removing and jobs scheduling.
CatGrid implements similar interface as Slurm cluster, such as sbatch, squeue and scancel.