Run Fragment Recruitment¶

The fragment recruitment module can be used to find genomes in a set of read datasets.

In case the fragment recruitment module is part of the full pipeline pr per-sample pipeline configuration then reads that could not be mapped back to a contig are screened for a user provided list of MAGs. Detected genomes are included in all other parts of the remaining pipeline. Look out for their specific headers to differentiate results based on real assembled genomes and the reference genomes.

Note: This module currently only supports illumina data.

Input¶

CommandConfiguration file for fragment recruitment via mash screen and BWAConfiguration file for fragment recruitment via mash screen and BowtieInput TSV file for genomesInput TSV file for paired end readsInput TSV file for single end reads

-entry wFragmentRecruitment -params-file example_params/fragmentRecruitment.yml

Warning

The configuration file shown here is for demonstration and testing purposes only. Parameters that should be used in production can be viewed in the fragment recruitment section of one of the yaml files located in the default folder of the Toolkit's Github repository.

tempdir: "tmp"
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
  fragmentRecruitment:
    mashScreen:
      samples:
        paired: test_data/fragmentRecruitment/paired.tsv
        single: test_data/fragmentRecruitment/single.tsv
      genomes: test_data/fragmentRecruitment/mags.tsv
      unzip:
        timeLimit: "AUTO"
      additionalParams:
        mashSketch: " "
        mashScreen: " "
        bwa2: " "
        minimap: " "
        coverm: "  --min-covered-fraction 0  "
        covermONT: "  --min-covered-fraction 0 "
        samtoolsViewBwa2: " -F 3584 " 
        samtoolsViewMinimap: " " 
      mashDistCutoff: 0.70
      coveredBasesCutoff: 0.2
      mashHashCutoff: 2
    genomeCoverage:
      additionalParams: ""
    contigsCoverage:
      additionalParams: ""
resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1

Warning

The configuration file shown here is for demonstration and testing purposes only. Parameters that should be used in production can be viewed in the fragment recruitment section of one of the yaml files located in the default folder of the Toolkit's Github repository.

tempdir: "tmp"
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
steps:
  fragmentRecruitment:
    mashScreen:
      samples:
        paired: test_data/fragmentRecruitment/paired.tsv
        single: test_data/fragmentRecruitment/single.tsv
      genomes: test_data/fragmentRecruitment/mags.tsv
      unzip:
        timeLimit: "AUTO"
      additionalParams:
        mashSketch: " "
        mashScreen: " "
        bowtie: " "
        minimap: " "
        coverm: "  --min-covered-fraction 0  "
        covermONT: "  --min-covered-fraction 0  "
        samtoolsViewBowtie: " -F 3584 " 
        samtoolsViewMinimap: " " 
      mashDistCutoff: 0.70
      coveredBasesCutoff: 0.2
      mashHashCutoff: 2
    genomeCoverage:
      additionalParams: ""
    contigsCoverage:
      additionalParams: ""

resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1

PATH
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.1.fa
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.2.fa
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.8.fasta
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.9.fasta
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/test234.fa
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.32.fa

SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
test2   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz

SAMPLE  READS
test1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/read1_1.fq.gz

NOTE! The file names of all provided genomes must be unique.

The following parameters can be configured:

mashDistCutoff: All hits below this threshold are discarded.
mashHashCutoff: All hits that have a lower count of matched minimum hashes are discarded.
coveredBasesCutoff: Number of bases that must be covered by at least one read. By how many reads the bases must be covered can be configured via the coverm setting (coverm: " --min-covered-fraction 0 ").

Output¶

The module outputs mash screen and bowtie alignment statistics. Furthermore, the module provides a coverm output which basically reports all metrics about the found genomes (e.g covered bases,length, tpm, ...).