BACTpipe introduction and overview

BACTpipe uses whole genome shotgun sequenced, paired end reads, to assemble and annotate single bacterial genomes.

BACTpipe’s analysis flow starts with pre-processing of paired end reads in fastq format using fastp, followed by taxonomic classification and gram stain identification by Kraken2. This step also identifies if the sample is potentially contaminated, i.e. contains more than one species. Then the sample is de-novo assembled using Shovill. The draft genome fasta file headers are renamed to get unique genome-specific headers.

Finally, genome annotation is performed using prokka with genus, species, and gram stain information (if possible to uniquely identify in the Kraken2 step) added. Lastly, basic statistics about the assembly and annotation are collected into a HTML report using MultiQC.

BACTpipe is implemented in Nextflow and an overview of the workflow can be seen below with the different output files at the bottom.

Flowchart showing BACTpipe workflow.