Skip to content

Assembly

Input

-entry wShortReadAssembly -params-file example_params/assembly.yml
-entry wOntAssembly -params-file example_params/assemblyONT.yml

Warning

The configuration file shown here is for demonstration and testing purposes only. Parameters that should be used in production can be viewed in the assembly section of one of the yaml files located in the default folder of the Toolkit's Github repository.

tempdir: "tmp"
summary: false
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
  assembly:
    input:
      paired: test_data/assembly/samples.tsv
      single: test_data/assembly/samplesUnpaired.tsv
    megahit:
      additionalParams: " --min-contig-len 200 "
      fastg: true
      resources:
         RAM: 
            mode: 'DEFAULT'
            predictMinLabel: 'AUTO' 

resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1
tempdir: "tmp"
summary: false
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
  assembly:
    input: 
      paired: test_data/assembly/samples.tsv 
    metaspades:
      additionalParams: "  "
      fastg: true
resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1

Warning

The configuration file shown here is for demonstration and testing purposes only. Parameters that should be used in production can be viewed in the assemblyONT section of one of the yaml files located in the default folder of the Toolkit's Github repository.

tempdir: "tmp"
summary: false
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
steps:
  assemblyONT:
    input: test_data/assembly/samplesONT.tsv 
    metaflye:
      additionalParams: " -i 1 "
      quality: " --nano-raw "

resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1
SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
SAMPLE  READS
nano    https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/SRR16328449_qc.fq.gz

Output

The output is a gzipped fasta file containing contigs.

Megahit

Error Handling

On error with exit codes ([-9, 137, 247]) (e.g. due to memory restrictions), the tool is executed again with higher cpu and memory values. The memory and cpu values are in case of a retry selected based on the flavor with the next higher memory value. The highest possible cpu/memory value is restricted by the highest cpu/memory value of all flavors defined in the resource section (see global configuration section).

Peak memory usage prediction

Memory consumption of an assembler varies based on diversity. We trained a machine learning model on kmer frequencies and the nonpareil diversity index in order to be able to predict the memory peak consumption of megahit in our full pipeline mode. The required resources in order to run the assembler are thereby fitted to the resources that are actually needed for a specific dataset. If this mode is enabled then Nonpareil and kmc that are part of the quality control module are automatically executed before the assembler run.

Please note that this mode is only tested for Megahit with default parameters and the meta-sensitive mode (--presets meta-sensitive).

  resources:
    RAM: 
      mode: MODE
      predictMinLabel: LABEL

where * MODE can be either 'PREDICT' for predicting memory usage or 'DEFAULT' for using a default flavor defined in the resources section.

* LABEL is the flavor that will be used if the predicted RAM is below the memory value defined as part of the LABEL flavor. It can also be set to AUTO to always use the predicted flavor.