Exploratory Metagenome Browser (EMGB)¶

The output generated by the Metagenomics-Toolkit can be imported into EMGB. EMGB allows you to easily explore your metagenomic samples in terms of MAGs, genes and their function.

You can either directly specify the export to EMGB in your config file when you analyse your samples, or you can execute the export afterwards by running the Toolkit again with the export entry point.

Currently, the export for EMGB needs the results of the following tools:

Assembly
Binning
Checkm (v1 or v2)
Prokka output
GTDB-Tk
MMseqs Taxonomy (Database: GTDB)
MMseqs (Database: UniRef90)

Input¶

CommandConfiguration File

-entry wExportPipeline -params-file example_params/export.yml

Warning

The configuration file shown here is for demonstration and testing purposes only. Parameters that should be used in production can be viewed in the read mapping section of one of the yaml files located in the default folder of the Toolkit's Github repository.

tempdir: "tmp"
s3SignIn: false
input: "output"
output: "output"
logDir: log
runid: 1
databases: "/mnt/databases"
logLevel: 1
scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
  export:
    emgb:
      additionalParams:
              blastDB: "bacmet20_predicted"
              taxonomyDB: "gtdb"
      titles:
        database:
          download:
            source: "https://openstack.cebitec.uni-bielefeld.de:8080/databases/uniref90.titles.tsv.gz"
            md5sum: aaf1dd9021243def8e6c4e438b4b3669
      kegg:
        database:
          download:
            source: s3://databases_internal/annotatedgenes2json_db_kegg-mirror-2022-12.tar.zst
            md5sum: 262dab8ca564fbc1f27500c22b5bc47b
            s5cmd:
              params: '--retry-count 30 --no-verify-ssl --endpoint-url https://openstack.cebitec.uni-bielefeld.de:8080'
resources:
  highmemLarge:
    cpus: 28
    memory: 230
  highmemMedium:
    cpus: 14
    memory: 113
  large:
    cpus: 28
    memory: 58
  medium:
    cpus: 14
    memory: 29
  small:
    cpus: 7
    memory: 14
  tiny:
    cpus: 1
    memory: 1

Additional Parameters¶

blastDB: The toolkit runs MMseqs against multiple databases. You can specify here, which BLAST output should be used. (Default: UniRef90)
taxonomyDB: MMseqs is executed against a specific taxonomy database. (Default: GTDB)

Output¶

The following files are produced as output:

SAMPLE.bins.json.gz
SAMPLE.contigs.json.gz
SAMPLE.genes.json.gz

where SAMPLE is the name of the sample.

You can read more here about how to start EMGB and use these files to import a dataset.