Run Fragment Recruitment¶
The fragment recruitment module can be used to find genomes in a set of read datasets.
In case the fragment recruitment module is part of the full pipeline pr per-sample pipeline configuration then reads that could not be mapped back to a contig are screened for a user provided list of MAGs. Detected genomes are included in all other parts of the remaining pipeline. Look out for their specific headers to differentiate results based on real assembled genomes and the reference genomes.
Note: This module currently only supports illumina data.
Input¶
-entry wFragmentRecruitment -params-file example_params/fragmentRecruitment.yml
Warning
The configuration file shown here is for demonstration and testing purposes only.
Parameters that should be used in production can be viewed in the fragment recruitment section
of one of the yaml files located in the default
folder of the Toolkit's Github repository.
tempdir: "tmp"
summary: false
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
fragmentRecruitment:
mashScreen:
samples:
paired: test_data/fragmentRecruitment/paired.tsv
single: test_data/fragmentRecruitment/single.tsv
genomes: test_data/fragmentRecruitment/mags.tsv
unzip:
timeLimit: "AUTO"
additionalParams:
mashSketch: " "
mashScreen: " "
bwa2: " "
minimap: " "
coverm: " --min-covered-fraction 0 "
covermONT: " --min-covered-fraction 0 "
samtoolsViewBwa2: " -F 3584 "
samtoolsViewMinimap: " "
mashDistCutoff: 0.70
coveredBasesCutoff: 0.2
mashHashCutoff: 2
genomeCoverage:
additionalParams: ""
contigsCoverage:
additionalParams: ""
resources:
highmemLarge:
cpus: 28
memory: 230
highmemMedium:
cpus: 14
memory: 113
large:
cpus: 28
memory: 58
medium:
cpus: 14
memory: 29
small:
cpus: 7
memory: 14
tiny:
cpus: 1
memory: 1
Warning
The configuration file shown here is for demonstration and testing purposes only.
Parameters that should be used in production can be viewed in the fragment recruitment section
of one of the yaml files located in the default
folder of the Toolkit's Github repository.
tempdir: "tmp"
summary: false
s3SignIn: false
output: "output"
logDir: log
runid: 1
logLevel: 1
scratch: "/vol/scratch"
steps:
fragmentRecruitment:
mashScreen:
samples:
paired: test_data/fragmentRecruitment/paired.tsv
single: test_data/fragmentRecruitment/single.tsv
genomes: test_data/fragmentRecruitment/mags.tsv
unzip:
timeLimit: "AUTO"
additionalParams:
mashSketch: " "
mashScreen: " "
bowtie: " "
minimap: " "
coverm: " --min-covered-fraction 0 "
covermONT: " --min-covered-fraction 0 "
samtoolsViewBowtie: " -F 3584 "
samtoolsViewMinimap: " "
mashDistCutoff: 0.70
coveredBasesCutoff: 0.2
mashHashCutoff: 2
genomeCoverage:
additionalParams: ""
contigsCoverage:
additionalParams: ""
resources:
highmemLarge:
cpus: 28
memory: 230
highmemMedium:
cpus: 14
memory: 113
large:
cpus: 28
memory: 58
medium:
cpus: 14
memory: 29
small:
cpus: 7
memory: 14
tiny:
cpus: 1
memory: 1
PATH
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.1.fa
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.2.fa
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.8.fasta
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.9.fasta
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/test234.fa
https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.32.fa
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
test2 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/interleaved.fq.gz
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/read1_1.fq.gz
NOTE! The file names of all provided genomes must be unique.
The following parameters can be configured:
-
mashDistCutoff: All hits below this threshold are discarded.
-
mashHashCutoff: All hits that have a lower count of matched minimum hashes are discarded.
-
coveredBasesCutoff: Number of bases that must be covered by at least one read. By how many reads the bases must be covered can be configured via the coverm setting (coverm: " --min-covered-fraction 0 ").
Output¶
The module outputs mash screen and bowtie alignment statistics. Furthermore, the module provides a coverm output which basically reports all metrics about the found genomes (e.g covered bases,length, tpm, ...).