Export¶
This module exports a set of results produced by Metagenomics-tk. Currently, the export for EMGB needs the results of the following tools:
- Assembly
- Binning
- Checkm (v1 or v2)
- Prokka output
- GTDB-Tk
- MMseqs Taxonomy (Database: GTDB)
- MMseqs (Database: UniRef90)
Input¶
-entry wExportPipeline -params-file example_params/export.yml
Warning
The configuration file shown here is for demonstration and testing purposes only.
Parameters that should be used in production can be viewed in the read mapping section
of one of the yaml files located in the default
folder of the Toolkit's Github repository.
tempdir: "tmp"
summary: false
s3SignIn: false
input: "output"
output: "output"
logDir: log
runid: 1
databases: "/mnt/databases"
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
export:
emgb:
additionalParams:
blastDB: "bacmet20_predicted"
taxonomyDB: "gtdb"
titles:
database:
download:
source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/uniref90.titles.tsv.gz"
md5sum: aaf1dd9021243def8e6c4e438b4b3669
kegg:
database:
download:
source: s3://databases_internal/annotatedgenes2json_db_kegg-mirror-2022-12.tar.zst
md5sum: 262dab8ca564fbc1f27500c22b5bc47b
s5cmd:
params: '--retry-count 30 --no-verify-ssl --endpoint-url https://openstack.cebitec.uni-bielefeld.de:8080'
resources:
highmemLarge:
cpus: 28
memory: 230
highmemMedium:
cpus: 14
memory: 113
large:
cpus: 28
memory: 58
medium:
cpus: 14
memory: 29
small:
cpus: 7
memory: 14
tiny:
cpus: 1
memory: 1
Additional Parameters¶
-
blastDB: The toolkit runs MMseqs against multiple databases. You can specify here, which BLAST output should be used. (Default: UniRef90)
-
taxonomyDB: MMseqs is executed against a specific taxonomy database. (Default: GTDB)
Output¶
The following files are produced as output:
- SAMPLE.bins.json.gz
- SAMPLE.contigs.json.gz
- SAMPLE.genes.json.gz
where SAMPLE
is the name of the sample.
You can read more here about how to start EMGB and use these files to import a dataset.