Skip to content

Binning

The binning module groups together assembled contigs to create bins representing genomes. Depending on whether ONT, short-read and/or multi-sample binning should be executed, multiple binning configurations are offered.

For multi-sample binning, a group of samples must first be configured. If one of these samples fails QC or assembly, leaving only one sample per group, then the sample is switched from multi-sample binning to per-sample binning.

Short Read

For short-read data the Toolkit supports MetaBAT2 and Semibin2.

Input

NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
      -profile standard \
      -params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/binning.yml \
      -ansi-log false \
      -entry wShortReadBinning \
      -resume \
      --steps.binning.input.paired https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binning/samples.tsv \
      --steps.binning.input.single https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binning/samplesUnpaired.tsv \
      --steps.binning.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binning/assembly.tsv \
      --logDir logs_binning 
SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz
SAMPLE  CONTIGS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/test1_contigs.fa.gz

Output

Contig Coverage

CoverM is executed on all alignment files to provide basic contig coverage information. The output consists of two files. One file contains the output of all CoverM's default methods. The other file contains metabat-specific metrics offered by CoverM.

contigCoverage/*_default_coverm_coverage.tsv

The following methods are used as parameters for CoverM to produce this file mean trimmed_mean variance length count reads_per_base rpkm tpm.

contigCoverage/*_metabat_coverm_coverage.tsv

The parameter for CoverM is just metabat.

Alignment

contigMapping/*.bam

Alignment file containing the alignment of reads mapped back to the assembly.

contigMapping/*_unmapped.fq.gz

Reads that could not be mapped back.

Genome Coverage

CoverM is executed on all alignment files to provide basic coverage information of all generated genomes. Every file is based on a CoverM method mean trimmed_mean variance length count reads_per_base rpkm tpm.

genomeCoverage/*_count.tsv

genomeCoverage/*_mean.tsv

genomeCoverage/*_relative_abundance.tsv

genomeCoverage/*_rpkm.tsv

genomeCoverage/*_tpm.tsv

genomeCoverage/*_trimmed_mean.tsv

Read Mapping Quality

readMappingQuality/*_flagstat.tsv

Samtools flagstat output

readMappingQuality/*_flagstat_failed.tsv

Samtools flagstat output of failed reads in horizontal format.

readMappingQuality/*_flagstat_passed.tsv

Samtools flagstat output of passed reads in horizontal format.

Binning Tool Output

The output always has the same directory structure for all binning tools:

Example Semibin2:

semibin2/*_bin.*.fa

Generated bins.

semibin2/*_bin_contig_mapping.tsv

Bin to contig mapping.

semibin2/*_bins_stats.tsv

Basic sequence stats produced by seqkit.

semibin2/*_contigs_depth.tsv

Output of the jgi_summarize_bam_contig_depths file.

semibin2/*_notBinned.fa

Contigs that could not be binned.

ONT

For Nanopore data the Toolkit supports MetaBAT2 and MetaCoAG.

Input

NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
      -profile standard \
      -params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/binningONT.yml \
      -ansi-log false \
      -entry wOntBinning \
      -resume \
      --steps.binningONT.input.reads https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binningONT/samplesONT.tsv \
      --steps.binningONT.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binningONT/assemblyONT.tsv \
      --steps.binningONT.input.quality https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/binningONT/quality.tsv \
      --logDir logs_binningONT 
SAMPLE  READS
nano    https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz
SAMPLE  CONTIGS
nano    https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/nano_contigs.fa.gz
SAMPLE  QUALITY
nano    15.5

Output

The output is the same as for short read data.

Multi-sample Binning Short Read

For short read multi-sample data the Toolkit supports SemiBin2.

Input

NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
      -profile standard \
      -params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/multiBinning.yml \
      -ansi-log false \
      -entry wMultiBinningShortRead \
      -resume \
      --steps.multiBinning.input.paired https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/samples.tsv \
      --steps.multiBinning.input.single https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/samplesUnpaired.tsv \
      --steps.multiBinning.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/assembly.tsv \
      --steps.multiBinning.input.groups https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinning/groups.tsv \
      --logDir logs_multiBinning 
SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
test2   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz
test2   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz
SAMPLE  CONTIGS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/contigs1.fa.gz
test2   https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/contigs2.fa.gz
SAMPLE  GROUP
test1   A
test2   A

Output

In addition to the files produced for short read data, the following files are also created:

Concatenated Assembly Alignment

concatenatedAssemblyMapping/*.bam

This alignment file contains the mapping of reads from each sample back to the concatenated assembly, which consists of all the assemblies specified in the groups file.

concatenatedAssemblyMapping/*_unmapped.fq.gz

Reads that could not be mapped back.

Multi-sample Binning ONT

Input

NXF_VER=25.10.4 nextflow run metagenomics/metagenomics-tk \
      -profile standard \
      -params-file https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/default/modules/binning/multiBinningONT.yml \
      -ansi-log false \
      -entry wMultiBinningLongRead \
      -resume \
      --steps.multiBinningONT.input.reads https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/samplesONT.tsv \
      --steps.multiBinningONT.input.contigs https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/assemblyONT.tsv \
      --steps.multiBinningONT.input.quality https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/quality.tsv \
      --steps.multiBinningONT.input.groups https://raw.githubusercontent.com/metagenomics/metagenomics-tk/refs/heads/master/test_data/multiBinningONT/groups.tsv \
      --logDir logs_multiBinningONT 

SAMPLE READS nano1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz nano2 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz ```

SAMPLE CONTIGS nano1 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/nano_contigs.fa.gz nano2 https://openstack.cebitec.uni-bielefeld.de:8080/meta_test/small/nano_contigs.fa.gz ```

SAMPLE  GROUP
nano1   A
nano2   A
SAMPLE  QUALITY
nano1   15.5
nano2   15.5

Output

The output is the same as for multi-sample short read data.