Poster Number
8
Poster Title
WESkit: A GA4GH WES-Compliant Workflow Execution Service for HPC and Cloud
Authors
Valentin Schneider-Lunitz1, Philip Kensche2, Landfried Kraatz1, Philipp Strubel1, Ivo Buchhalter2, Sven Twardziok1
1)Berlin Institute of Health at Charité
2)German Cancer Research Center (DKFZ)
1)Berlin Institute of Health at Charité
2)German Cancer Research Center (DKFZ)
Abstract
The rapid advancement of biological and medical research generates vast amounts of data, necessitating robust research data management practices that ensure reproducibility and reusability. The phase of data processing often employs workflow systems like Snakemake and Nextflow. However, challenges persist in automating and managing numerous workflow executions, to ensure reproducibility. To address these bottlenecks, the Global Alliance for Genomic Health (GA4GH) defined the Workflow Execution Service (WES) interface for standardized workflow execution.
WESkit implements the GA4GH WES interface and manages the execution, monitoring, and documentation of workflow executions for research data processing.
WESkit is engineered to manage both Snakemake and Nextflow workflows in the cloud or remote compute infrastructure. Using helm charts for WESkit deployment on a Kubernetes cluster offers flexibility and scalability in setting up the software infrastructure and concurrent submission of numerous jobs. Job submission to a remote HPC allows user-friendly and secure workflow execution at the data location side, being compliant with local data regulation policies. WESkit’s compatibility with SLURM and IBM LSF scheduling systems ensures seamless interaction.
Submitted jobs are stored on MongoDB to persist their execution states and thus supporting long-term documentation for users.
By implementing the GA4GH WES interface, WESkit provides a unified way to execute workflows, thereby helping to address the challenges associated with automation and management of complex bioinformatic workflows.
WESkit implements the GA4GH WES interface and manages the execution, monitoring, and documentation of workflow executions for research data processing.
WESkit is engineered to manage both Snakemake and Nextflow workflows in the cloud or remote compute infrastructure. Using helm charts for WESkit deployment on a Kubernetes cluster offers flexibility and scalability in setting up the software infrastructure and concurrent submission of numerous jobs. Job submission to a remote HPC allows user-friendly and secure workflow execution at the data location side, being compliant with local data regulation policies. WESkit’s compatibility with SLURM and IBM LSF scheduling systems ensures seamless interaction.
Submitted jobs are stored on MongoDB to persist their execution states and thus supporting long-term documentation for users.
By implementing the GA4GH WES interface, WESkit provides a unified way to execute workflows, thereby helping to address the challenges associated with automation and management of complex bioinformatic workflows.
Digital Poster