Binning¶
The binning module groups together assembled contigs to create bins representing genomes. Depending on whether ONT, short-read and/or multi-sample binning should be executed, multiple binning configurations are offered.
For multi-sample binning, a group of samples must first be configured. If one of these samples fails QC or assembly, leaving only one sample per group, then the sample is switched from multi-sample binning to per-sample binning.
Short Read¶
For short-read data the Toolkit supports MetaBAT2 and Semibin2.
Input¶
NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
-profile standard \
-params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/binning.yml \
-ansi-log false \
-entry wShortReadBinning \
-resume \
--steps.binning.input.paired https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binning/samples.tsv \
--steps.binning.input.single https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binning/samplesUnpaired.tsv \
--steps.binning.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binning/assembly.tsv \
--logDir logs_binning
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz
SAMPLE CONTIGS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/test1_contigs.fa.gz
Output¶
Contig Coverage¶
CoverM is executed on all alignment files to provide basic contig coverage information. The output consists of two files. One file contains the output of all CoverM's default methods. The other file contains metabat-specific metrics offered by CoverM.
contigCoverage/*_default_coverm_coverage.tsv-
The following methods are used as parameters for CoverM to produce this file
mean trimmed_mean variance length count reads_per_base rpkm tpm. contigCoverage/*_metabat_coverm_coverage.tsv-
The parameter for CoverM is just
metabat.
Alignment¶
contigMapping/*.bam-
Alignment file containing the alignment of reads mapped back to the assembly.
contigMapping/*_unmapped.fq.gz-
Reads that could not be mapped back.
Genome Coverage¶
CoverM is executed on all alignment files to provide basic coverage information of all generated genomes.
Every file is based on a CoverM method mean trimmed_mean variance length count reads_per_base rpkm tpm.
genomeCoverage/*_count.tsv
genomeCoverage/*_mean.tsv
genomeCoverage/*_relative_abundance.tsv
genomeCoverage/*_rpkm.tsv
genomeCoverage/*_tpm.tsv
genomeCoverage/*_trimmed_mean.tsv
Read Mapping Quality¶
readMappingQuality/*_flagstat.tsv-
Samtools flagstat output
readMappingQuality/*_flagstat_failed.tsv-
Samtools flagstat output of failed reads in horizontal format.
readMappingQuality/*_flagstat_passed.tsv-
Samtools flagstat output of passed reads in horizontal format.
Binning Tool Output¶
The output always has the same directory structure for all binning tools:
Example Semibin2:
semibin2/*_bin.*.fa-
Generated bins.
semibin2/*_bin_contig_mapping.tsv-
Bin to contig mapping.
semibin2/*_bins_stats.tsv-
Basic sequence stats produced by
seqkit. semibin2/*_contigs_depth.tsv-
Output of the jgi_summarize_bam_contig_depths file.
semibin2/*_notBinned.fa-
Contigs that could not be binned.
ONT¶
For Nanopore data the Toolkit supports MetaBAT2 and MetaCoAG.
Input¶
NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
-profile standard \
-params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/binningONT.yml \
-ansi-log false \
-entry wOntBinning \
-resume \
--steps.binningONT.input.reads https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binningONT/samplesONT.tsv \
--steps.binningONT.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binningONT/assemblyONT.tsv \
--steps.binningONT.input.quality https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binningONT/quality.tsv \
--logDir logs_binningONT
SAMPLE READS
nano https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz
SAMPLE CONTIGS
nano https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/nano_contigs.fa.gz
SAMPLE QUALITY
nano 15.5
Output¶
The output is the same as for short read data.
Multi-sample Binning Short Read¶
For short read multi-sample data the Toolkit supports SemiBin2.
Input¶
NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
-profile standard \
-params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/multiBinning.yml \
-ansi-log false \
-entry wMultiBinningShortRead \
-resume \
--steps.multiBinning.input.paired https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/samples.tsv \
--steps.multiBinning.input.single https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/samplesUnpaired.tsv \
--steps.multiBinning.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/assembly.tsv \
--steps.multiBinning.input.groups https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/groups.tsv \
--logDir logs_multiBinning
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
test2 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz
test2 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz
SAMPLE CONTIGS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/contigs1.fa.gz
test2 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/contigs2.fa.gz
SAMPLE GROUP
test1 A
test2 A
Output¶
In addition to the files produced for short read data, the following files are also created:
Concatenated Assembly Alignment¶
concatenatedAssemblyMapping/*.bam-
This alignment file contains the mapping of reads from each sample back to the concatenated assembly, which consists of all the assemblies specified in the groups file.
concatenatedAssemblyMapping/*_unmapped.fq.gz-
Reads that could not be mapped back.
Multi-sample Binning ONT¶
Input¶
NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
-profile standard \
-params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/multiBinningONT.yml \
-ansi-log false \
-entry wMultiBinningLongRead \
-resume \
--steps.multiBinningONT.input.reads https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/samplesONT.tsv \
--steps.multiBinningONT.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/assemblyONT.tsv \
--steps.multiBinningONT.input.quality https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/quality.tsv \
--steps.multiBinningONT.input.groups https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/groups.tsv \
--logDir logs_multiBinningONT
SAMPLE READS nano1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz nano2 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz ```
SAMPLE CONTIGS nano1 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/nano_contigs.fa.gz nano2 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/nano_contigs.fa.gz ```
SAMPLE GROUP
nano1 A
nano2 A
SAMPLE QUALITY
nano1 15.5
nano2 15.5
Output¶
The output is the same as for multi-sample short read data.