MagAttributes¶
Input¶
-entry wMagAttributes -params-file example_params/magAttributes.yml
Warning
The configuration file shown here is for demonstration and testing purposes only.
Parameters that should be used in production can be viewed in the magAttributes section
of one of the yaml files located in the default
folder of the Toolkit's Github repository.
tempdir: "tmp"
summary: true
output: "output"
logDir: log
runid: 1
s3SignIn: false
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
magAttributes:
input: "test_data/magAttributes/input.tsv"
gtdb:
buffer: 1000
database:
download:
source: https://openstack.cebitec.uni-bielefeld.de:8080/databases/gtdbtk_r214_data.tar.gz
md5sum: 390e16b3f7b0c4463eb7a3b2149261d9
additionalParams: " --min_af 0.65 --scratch_dir . "
checkm2:
database:
download:
source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/checkm2_v2.tar.gz"
md5sum: a634cb3d31a1f56f2912b74005f25f09
additionalParams: " "
checkm:
database:
download:
source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/checkm_data_2015_01_16.tar.gz"
md5sum: 0963b301dfe9345ea4be1246e32f6728
buffer: 200
additionalParams:
tree: " --reduced_tree "
lineage_set: " "
qa: " "
prokka:
defaultKingdom: false
additionalParams: " --mincontiglen 200 "
resources:
highmemLarge:
cpus: 28
memory: 230
highmemMedium:
cpus: 14
memory: 113
large:
cpus: 28
memory: 58
medium:
cpus: 14
memory: 29
small:
cpus: 7
memory: 14
tiny:
cpus: 1
memory: 1
DATASET BIN_ID PATH COMPLETENESS CONTAMINATION COVERAGE N50 HETEROGENEITY
test1 bin.1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.1.fa 100 0 10 5000 10
test1 bin.2 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.2.fa 100 0 10 5000 10
test1 bin.8 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.8.fasta 100 0 10 5000 10
test2 bin.9 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.9.fasta 100 0 10 5000 10
test2 bin.32 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.32.fa 100 0 10 5000 10
DATASET
identifier and mag specific PATH
and BIN_ID
column.
Databases¶
Checkm and GTDB need their databases as input. See database section for possibly download strategies.
The GTDB and Checkm compressed databases must be tar.gz files. If you provide the extracted version of GTDB using the extractedDBPath
parameter,
please specify the path to the releasesXXX
directory (e.g. "/vol/spool/gtdb/release202").
If you need credentials to access your files via S3 then please use the following command:
For GTDB:
nextflow secrets set S3_gtdb_ACCESS XXXXXXX
nextflow secrets set S3_gtdb_SECRET XXXXXXX
For Checkm:
nextflow secrets set S3_checkm_ACCESS XXXXXXX
nextflow secrets set S3_checkm_SECRET XXXXXXX
Output¶
GTDBTk¶
All GTDB files include the GTDB specific columns in addition to a SAMPLE
column (SAMPLE_gtdbtk.bac120.summary.tsv
, SAMPLE_gtdbtk.ar122.summary.tsv
).
In addition, this module produces a file SAMPLE_gtdbtk_CHUNK.tsv
that combines both files and adds a BIN_ID
column that adheres to the magAttributes specification
Checkm and Checkm2¶
The Checkm and Checkm2 output adheres to the magAttributes specification and adds a BIN_ID
and SAMPLE
column to the output file.
If Checkm2 and Checkm are both specified in the config file then only the Checkm2 results are used for downstream pipeline steps.