Skip to content

Usage notes

Pre-sketched databases available for download below. All databases work from sylph version 0.3.x onwards.

  • Use the Primary links hosted at http://faust.compbio.cs.cmu.edu if possible. We provide mirrors on google cloud, but this costs us more money.

Example usage:

# download database
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r226-c200-dbv1.syldb

# profile against database
sylph profile gtdb-r226-c200-dbv1.syldb -1 sample_R1.fq -2 sample_R2.fq  -t 30 > results.tsv

Note on taxonomy usage:

Most the databases have associated taxonomies that sylph can utilize. See here for more information on taxonomy integration.

Databases

Type Name Genomes c-parameter Size Primary Download Link Mirror Notes
Prokaryotic (GTDB) GTDB r226 143,614 species -c 200 18.4 GB gtdb-r226-c200-dbv1.syldb mirror
GTDB r226 143,614 species -c 1000 3.7 GB gtdb-r226-c1000-dbv1.syldb mirror
GTDB r220 113,104 species -c 200 13.1 GB gtdb-r220-c200-dbv1.syldb mirror
GTDB r220 113,104 species -c 1000 2.6 GB gtdb-r220-c1000-dbv1.syldb mirror
GTDB r214 85,202 species -c 200 10 GB v0.3-c200-gtdb-r214.syldb mirror
GTDB r214 85,202 species -c 1000 2 GB v0.3-c1000-gtdb-r214.syldb mirror
Prokaryotic (GlobDB) GlobDB r226 306,260 species -c 200 32 GB See the GlobDB website Third-party database
GlobDB r226 306,260 species -c 1000 6.5 GB See the GlobDB website Third-party database
Prokaryotic (Other) OceanDNA 8,466 ocean MAGs -c 200 800 MB OceanDNA-c200-v0.3.syldb mirror
SMAG 21,077 soil MAGs 200 2.5 GB SMAG-c200-v0.3.syldb mirror
UHGG v2.0.1 (not dereplicated) 289,232 gut genomes -c 200 26 GB uhgg_all_c200_v0.3.0.syldb mirror Not dereplicated - do not use for profiling
Viral UHGV 171,338 gut vOTUs -c 100 0.4 GB uhgv_c100_dbv1.syldb mirror
UHGV 171,338 gut vOTUs -c 200 0.2 GB uhgv_c200_dbv1.syldb mirror
IMG/VR4.1 2,917,516 viral genomes -c 200 2 GB imgvr_c200_v0.3.0.syldb mirror
Eukaryotic RefSeq Fungi 595 genomes -c 200 700 MB fungi-refseq-2024-07-25-c200-v0.3.syldb mirror
TARA Oceans 713 eukaryotic MAGs/SAGs -c 200 900 MB tara-eukmags-c200-v0.3.syldb mirror

Parameter Guide

  • -c 200: More sensitive, larger file size
  • -c 1000: More efficient, smaller file size, less sensitive
  • -c 100: More sensitive but primarily for smaller genomes.

Note

-c 200 is used by default, so -c 100 must be specified if using a database with -c 100. For example:

sylph profile c100_database c1000_database -c100 -1 read1.fq -2 read2.fq.

Database descriptions

GTDB Databases

The GTDB database is a high-quality, curated taxonomy and database for prokaryotes (archaea and bacteria). We take the dereplicated, species-representative genomes (one genome per species).

Available databases:

  • GTDB r226 database (143,614 species representative genomes) - April 16, 2025

  • GTDB r220 database (113,104 species representative genomes) - April 24, 2024

  • GTDB r214 database (85,202 species representative genomes) - April 28, 2023

GlobDB - massive prokaryotic catalogue encompassing many other genome sets

GlobDB is a catalogue of > 300,000 prokaryotic genomes/MAGs. Their database is dereplicated at ~96% ANI. GlobDB encompasses 14 other large databases (including GTDB). The sylph database is hosted on their website.

Other prokaryotic databases

Tip

GlobDB encompasses SMAG and a OceanDNA. We highly recommend using GlobDB if possible over these databases.

  1. OceanDNA catalogue of 8,466 ocean prokaryotic MAGs, -c 200 (800 MB)
  2. SMAG catalogue of soil 21,077 soil MAGs, -c 200 (2.5 GB):
  3. UHGG v2.0.1 catalogue of 289,232 gut genomes. Not dereplicated. Do not use for profiling. -c 200 (26 GB):

Viral databases

  1. Unified Human Gut Virome (UHGV) catalog - (171,338 vOTUs from human gut)
    • Note: is more refined and has better taxonomic annotations than IMG/VR for human gut
  2. Pre-sketched IMG/VR4.1 database for high-confidence vOTU representatives (2,917,516 viral genomes).

Eukaryotic databases.

  1. 595 representative RefSeq fungi genomes (downloaded 2024-07-25), -c 200 (700 MB)
  2. 713 TARA Oceans eukaryotic MAGs/SAGs from Delmont et al., -c 200 (900 MB)