Modules

SP3 has taken a clear modular design approach. Each module has its single and independent responsibility, performing one aspect of the designed functionality.

All modules are working together in three layers, UI, APIs and Cloud. All modules are written in Python3.6+ and prefix with Cat*, with a wish to be agile and flexible.

_images/modules.png

CatWeb is a web application that calls Cat web API modules for different functionalities. CatGrid and CatCloud provide a unified layer for cloud agnostic compute node scaling and job scheduling.

CatWeb UI allows user to interact with SP3, for fetching data, analysing samples on different pipelines, monitoring the progress and download the result.

Cat Web APIs

CatFetch

CatFetch is a web API that downloads data from different data sources, like ENA or other sources.

CatDownload

CatDownload is a web API that provides URL of files to be downloaded.

CatTag

CatTag is a web API that provides tagging features for runs and samples.

CatStat

CatStats collects state of the CatGrid, and draws graphs based on the CatGrid state, it serves a single svg graph as a Web API.

CatReport

Catreport is a service that takes requests for reporting, generating reports and serving them on the web.

For example, a clockwork pipeline has a report generated by Catreport for its output on Kraken2, Mykrobe, Samtool QC and drug resistance prediction.

CaTree

Catree API is a service that takes requests for building phylogeny trees, using iqtree

Currently, the requests could be made for neighbours of TB samples.

CatPile

Catpile is a service that allows user to view sp3 metadata at dataset detail page as well as the report of the sample.

When fetch occurs, the API is called to load the sp3 data. When a new run submits, the API links a fetch with a run.

CatDap API

Catdap is a service that manages SP3 user authentication and authorisation.

It allows users register an account with SP3 and they only can access the data of their organisation.

CatPersistence

CatPersistence provides a web view of data in SP3 persistence storage.

SP3 persistence storage stores data from different site for long-term storage. Currently, we store run information, pipeline output files and reports.

What output data need to be stored is configurable, for example, we storage consensus fasta files from clockwork TB pipeline. This enables users to query TB neighbourhood and build phylogeny trees.

Cloud Agnostic Operation

CatCloud

CatCloud is a python application that performs cloud compute nodes scaling. It manages vms on different cloud infrastructure in a configurable fashion.

CatCloud can work with CatGrid or Slurm Cluster.

CatGrid API

Catgrid is a configurationless, agentless grid scheduler with a web api. It manages nodes and jobs. It accepts web API request for nodes adding, removing and jobs scheduling.

CatGrid implements similar interface as Slurm cluster, such as sbatch, squeue and scancel.