Running BACTpipe¶
After installing all the required dependencies and downloading the required mash sketches of refseq genomes, it is very easy to run BACTpipe. There are several ways to run BACTpipe, but we’ll start with the easiest:
$ nextflow run ctmrbio/BACTpipe --reads 'path/to/reads/*_R{1,2}.fastq.gz'
This will instruct Nextflow to go to the ctmrbio
Github organization to
download and run the BACTpipe
workflow. The argument --reads
is used to
tell the workflow which input files you want to run BACTpipe on. Note that the
path to the reads must be enclosed in single quotes ('
) to prevent the
shell from automatically expanding asterisks (*
) and curly braces ({}
).
In the above example, the part of the filename matched by the asterisk will be
used as the sample name in BACTpipe, and {1,2}
refers to the pair of FASTQ
files for paired-end data. Input data should be in FASTQ format, and can be
either plain FASTQ, or compressed with gzip or bzip2 (with .gz
or .bz2
file suffixes).
Note
When you run BACTpipe for the first time using a command like the one
shown above, Nextflow downloads the current version of the Github repo
to your computer. If BACTpipe is updated after your first run, the
subsequent runs will still use the old version that you have downloaded.
To get the newest version, tell Nextflow to update your local copy:
nextflow pull ctmrbio/BACTpipe
.
When BACTpipe is run like this, it by default assumes you want to run everything locally, on the current machine. Note that BACTpipe is capable of running on a wide range of computers, ranging from laptops to powerful multicore machines to high-performance computing (HPC) clusters.
Changing settings on the command line¶
When running BACTpipe, you may want to modify parameters to customize it for your purpose. It is possible to modify several settings for how BACTpipe operates using configuration parameters. All changes can be added as command-line arguments when running BACTpipe, e.g.:
$ nextflow run ctmrbio/BACTpipe --shovill_kmers 21,33,55,77 --reads 'path/to/reads/*_{1,2}.fastq.gz'
The --shovill_kmers
flag will modify the kmer lengths that shovill will
use in its SPAdes assembly. The following parameters can be easily configured
from the command line:
Parameter name Default setting Description
reads [empty] Input fastq files, required!
output_dir BACTpipe_results Name of outuput directory
keep_trimmed_fastq [FALSE] Output trimmed fastq files from fastp into output_dir
keep_shovill_output [FALSE] Output shovill output directory into output_dir
kraken2_db [empty] Path to Kraken2 database to use for taxonomic classification
kraken2_confidence 0.5 Kraken2 confidence parameter, refer to `kraken2`_ documentation for details
kraken2_min_proportion 1.00 Minimum proportion of reads on sample level to classify sample as containing species
shovill_depth 100 See the `shovill`_ documentation for details
shovill_kmers 31,33,55,77,99,127
shovill_minlen 500
prokka_evalue 1e-09 See the `prokka`_ documentation for details
prokka_kingdom Bacteria
prokka_reference [not used]
prokka_signal_peptides false
To modify any parameter, just add --<parameter_name> <new_setting>
on the
command line when running BACTpipe, e.g. --shovill_depth 75
to set
Shovill’s depth parameter to 75 instead of 100. Refer to params.config
in
the conf
directory of the BACTpipe repository for a complete up-to-date
listing of all available parameters.
Change many settings at once¶
If you want to change many different settings at the same time when running
BACTpipe, it can quickly result in very long command lines. A way to make it
easier to change several parameters at once is to create a custom configuration
file in YAML or JSON format that you give to BACTpipe using -params-file
.
The parameter settings you define in your custom configuration file will override the default settings. Custom configuration files can be written in either YAML or JSON format. The simplest format for the custom parameters file is probably YAML, and is the recommended choice. Here is an example YAML configuration file that modifies some shovill parameters and leaves all other settings to their default values:
shovill_depth: "100"
shovill_kmers: "31,33,55,77,99,111,127"
shovill_minlen: "400"
If you save the above into a plain text file called custom_bactpipe_config.yaml
you can provide it when running BACTpipe using the -params-file
command
line argument:
$ nextflow run ctmrbio/BACTpipe -params-file path/to/your/custom/params.yaml --reads 'path/to/reads/*_{1,2}.fastq.gz'
There is also another way to modify parameters that uses Nextflow’s own
configuration format. This can be useful if you want to modify a lot of
settings at once, since it is possible to download a copy of the default
configuration settings file directly from Github, params.config, and make
any changes you want directly in your custom version of params.config
. The
file contains some comments explaining how the different variables work, to
help out when modifying the settings. To run BACTpipe with a custom
configuration in the Nextflow format, you use -c
on the command line:
$ nextflow run ctmrbio/BACTpipe -c path/to/custom_params.config --reads 'path/to/reads/*_{1,2}.fastq.gz'
Note:¶
There are two different type of commandline arguments when running workflows
using Nextflow: 1) arguments using double dashes (i.e. --reads
) and 2)
arguments using a single dash (i.e. -params-file
). Arguments using double
dashes are sent to BACTpipe for evaluation, and are typically configuration
variables that are defined inside BACTpipe. Arguments using a single dash are
not visible to BACTpipe but are instead used by Nextflow itself, and typically
alters how Nextflow executes BACTpipe.
Profiles¶
A convenient way to modify the way BACTpipe is run in your environment is to load a profile. BACTpipe comes with a few pre-installed profiles:
standard
– For local use on e.g. a laptop or Linux server. This is the default profile used if no profile is explicitly specified.rackham
– For use on the UPPMAX’s Rackham HPC system.ctmr_nas
– For local execution on CTMR’s old analysis server.ctmr_gandalf
– For use on CTMR’s Gandalf Slurm HPC system.docker
– For use with docker containers.
To run BACTpipe with a specific profile, use the -profile <profilename>
argument (note the single dash before profile
) when running, e.g.:
$ nextflow run ctmrbio/BACTpipe -profile rackham --project SNIC001 --reads '/proj/projectname/reads/*_{1,2}.fastq.gz'
This will run BACTpipe using the rackham
profile with the project set to
SNIC001
, which automatically configures settings so BACTpipe can find all
the required software and databases in the UPPMAX HPC cluster environment.
Running BACTpipe without a -profile
argument will default to running the
standard
profile directly on the node you are logged in to (avoid doing
that on shared HPC systems).
Custom profile¶
It is possible to create a custom profile to use instead of the preconfigured ones. This is useful if you want to run BACTpipe on another cluster system than UPPMAX’s Rackham, or if the data you are analyzing requires you to change the pre-defined expected CPU, memory, and time requirements for processes on the cluster. The best way to start is probably to download one of the pre-existing profiles from conf directory of the BACTpipe repository.
If you are working on a Slurm-managed system, starting with either the
rackham.config
or the ctmr_gandalf
profile would be a good choice, as
both of those are Slurm-managed HPC systems. Download the configuration file
from the conf directory of the BACTpipe repository and modify settings to
your preference. Then, to run BACTpipe using your custom configuration file,
you need to tell Nextflow to read parameters from your file instead of the
default parameters:
$ nextflow run ctmrbio/BACTpipe -c path/to/your/custom/profile.config --reads 'path/to/reads/*_{1,2}.fastq.gz'
The custom profile is not limited to configuring CPU, memory and time limits for the different processes. It is also possible to set parameter values inside the custom profile, i.e. to change paths to reference databases or adjust runtime parameters for the different processes. It is also possible to just use a configuration file that changes settings without modifying how the workflow is run, see Change many settings at once.