Output Files

BACTpipe pipeline outputs several files in specific folders within the specified output directory (BACTpipe_results is the default output directory).

Output folders

  • fastp
  • shovill (optional)
  • kraken2 (optional)
  • prokka
  • multiqc

Description of file outputs

The following table provides a description of the most relevant files in each folder mentioned above.

Output Folder File output File Description
fastp
  • *.fastp.fq.gz (optional)
  • *fastp.json
  • Trimmed reads for each sample (For R1 and R2 respectively)
  • Quality trimming statistics
kraken2
  • *.kreport
  • *.classification.txt
  • Kraken2 report
  • Genus, species, and Gram stain classification
shovill
  • *_shovill (optional)
  • *.assembly_stats.txt
  • Shovill output directory containing *.fa, *.fasta, *.fastg, *.gfa, *.hist, *.log, *.changes, and *.tab files
  • Summary of assembly statistics
prokka
  • *.gff
  • *.gbk
  • *.fna
  • *.faa
  • *.ffn
  • *.txt
  • Annotation in GFF3 format, containing both sequences and annotations
  • Standard Genbank file
  • Nucleotide FASTA file of the input contig sequences
  • Protein FASTA file of the translated CDS sequences
  • Nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA)
  • Statistics relating to the annotated features found
multiqc
  • multiqc-report.html
  • Summary report of statistics generated by prokka and fastp tools

In addition to the files listed in the table above, Nextflow also produces two report files in the main run folder after the pipeline is finished. They are called BACTpipe_report.html and BACTpipe_timeline.html. The reports shows a summary of overall execution time, resource usage, and all executed tasks and their respective run time metrics. Lastly, Nextflow produces a work directory containing all intermediate files and logs produced in the process. This folder can be removed once the process has completed.