Exploratory Metagenome Browser (EMGB)¶
The output generated by the Metagenomics-Toolkit can be imported into EMGB. EMGB allows you to easily explore your metagenomic samples in terms of MAGs, genes and their function.
You can either directly specify the export to EMGB in your config file when you analyse your samples, or you can execute the export afterwards by running the Toolkit again with the export entry point.
Currently, the export for EMGB needs the results of the following tools:
- Assembly
- Binning
- Checkm (v1 or v2)
- Prokka output
- GTDB-Tk
- MMseqs Taxonomy (Database: GTDB)
- MMseqs (Database: UniRef90)
Input¶
-entry wExportPipeline -params-file example_params/export.yml
Warning
The configuration file shown here is for demonstration and testing purposes only.
Parameters that should be used in production can be viewed in the read mapping section
of one of the yaml files located in the default
folder of the Toolkit's Github repository.
tempdir: "tmp"
s3SignIn: false
input: "output"
output: "output"
logDir: log
runid: 1
databases: "/mnt/databases"
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
export:
emgb:
additionalParams:
blastDB: "bacmet20_predicted"
taxonomyDB: "gtdb"
titles:
database:
download:
source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/uniref90.titles.202506.tsv.gz"
md5sum: 65437094361e84756e156c54d3a6e56b
kegg:
database:
download:
source: s3://databases_internal/annotatedgenes2json_db_kegg-mirror-2025-05.tar.zst
md5sum: 94169a92d453f55553aa7edf4546d00d
s5cmd:
params: '--retry-count 30 --no-verify-ssl --endpoint-url https://openstack.cebitec.uni-bielefeld.de:8080'
resources:
highmemLarge:
cpus: 28
memory: 230
highmemMedium:
cpus: 14
memory: 113
large:
cpus: 28
memory: 58
medium:
cpus: 14
memory: 29
small:
cpus: 7
memory: 14
tiny:
cpus: 1
memory: 1
Additional Parameters¶
-
blastDB: The toolkit runs MMseqs against multiple databases. You can specify here, which BLAST output should be used. (Default: UniRef90)
-
taxonomyDB: MMseqs is executed against a specific taxonomy database. (Default: GTDB)
Output¶
The following files are produced as output:
- SAMPLE.bins.json.gz
- SAMPLE.contigs.json.gz
- SAMPLE.genes.json.gz
where SAMPLE
is the name of the sample.
You can read more here about how to start EMGB and use these files to import a dataset.