Skip to content

MagAttributes

Input

-entry wMagAttributes -params-file example_params/magAttributes.yml 

Warning

The configuration file shown here is for demonstration and testing purposes only. Parameters that should be used in production can be viewed in the magAttributes section of one of the yaml files located in the default folder of the Toolkit's Github repository.

tempdir: "tmp"
summary: true
output: "output"
logDir: log
runid: 1
s3SignIn: false
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
  magAttributes:
    input: "test_data/magAttributes/input.tsv"
    gtdb:
      buffer: 1000
      database:
        download:
          source: https://openstack.cebitec.uni-bielefeld.de:8080/databases/gtdbtk_r214_data.tar.gz
          md5sum: 390e16b3f7b0c4463eb7a3b2149261d9
      additionalParams: " --min_af 0.65 --scratch_dir . "
    checkm2:
      database:
        download:
          source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/checkm2_v2.tar.gz"
          md5sum: a634cb3d31a1f56f2912b74005f25f09
      additionalParams: "  "
    checkm:
      database:
        download:
          source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/checkm_data_2015_01_16.tar.gz"
          md5sum: 0963b301dfe9345ea4be1246e32f6728
      buffer: 200
      additionalParams:
        tree: " --reduced_tree "
        lineage_set: " " 
        qa: "  "
    prokka:
      defaultKingdom: false
      additionalParams: " --mincontiglen 200 "

resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1

DATASET BIN_ID  PATH    COMPLETENESS    CONTAMINATION   COVERAGE    N50 HETEROGENEITY
test1   bin.1   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.1.fa    100 0   10  5000    10
test1   bin.2   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.2.fa    100 0   10  5000    10
test1   bin.8   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.8.fasta 100 0   10  5000    10
test2   bin.9   https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.9.fasta 100 0   10  5000    10
test2   bin.32  https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/bins/bin.32.fa   100 0   10  5000    10
Must include at least DATASET identifier and mag specific PATH and BIN_ID column.

Databases

Checkm and GTDB need their databases as input. See database section for possibly download strategies. The GTDB and Checkm compressed databases must be tar.gz files. If you provide the extracted version of GTDB using the extractedDBPath parameter, please specify the path to the releasesXXX directory (e.g. "/vol/spool/gtdb/release202").

If you need credentials to access your files via S3 then please use the following command:

For GTDB:

nextflow secrets set S3_gtdb_ACCESS XXXXXXX
nextflow secrets set S3_gtdb_SECRET XXXXXXX

For Checkm:

nextflow secrets set S3_checkm_ACCESS XXXXXXX
nextflow secrets set S3_checkm_SECRET XXXXXXX

Output

GTDBTk

All GTDB files include the GTDB specific columns in addition to a SAMPLE column (SAMPLE_gtdbtk.bac120.summary.tsv, SAMPLE_gtdbtk.ar122.summary.tsv). In addition, this module produces a file SAMPLE_gtdbtk_CHUNK.tsv that combines both files and adds a BIN_ID column that adheres to the magAttributes specification

Checkm and Checkm2

The Checkm and Checkm2 output adheres to the magAttributes specification and adds a BIN_ID and SAMPLE column to the output file. If Checkm2 and Checkm are both specified in the config file then only the Checkm2 results are used for downstream pipeline steps.