Skip to content

Usage notes

Pre-sketched databases available for download below. All databases work from sylph version 0.3.x onwards.

  • Use the Primary links hosted at http://faust.compbio.cs.cmu.edu if possible. We provide mirrors on google cloud, but this costs us more money.

Example usage:

# download database
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r226-c200-dbv1.syldb

# profile against database
sylph profile gtdb-r226-c200-dbv1.syldb -1 sample_R1.fq -2 sample_R2.fq  -t 30 > results.tsv

Note on taxonomy usage:

Most of the databases have associated taxonomies that sylph can utilize. See here for more information on taxonomy integration.

Databases

Type Name Genomes c-parameter Size Primary Download Link Mirror Notes
Prokaryotic (GTDB) GTDB r232 199,923 species -c 200 24.1 GB gtdb-r232-c200-dbv1.syldb mirror
GTDB r232 199,923 species -c 1000 4.9 GB gtdb-r232-c1000-dbv1.syldb mirror
GTDB r226 143,614 species -c 200 18.4 GB gtdb-r226-c200-dbv1.syldb mirror
GTDB r226 143,614 species -c 1000 3.7 GB gtdb-r226-c1000-dbv1.syldb mirror
GTDB r220 113,104 species -c 200 13.1 GB gtdb-r220-c200-dbv1.syldb mirror
GTDB r220 113,104 species -c 1000 2.6 GB gtdb-r220-c1000-dbv1.syldb mirror
GTDB r214 85,202 species -c 200 10 GB v0.3-c200-gtdb-r214.syldb mirror
GTDB r214 85,202 species -c 1000 2 GB v0.3-c1000-gtdb-r214.syldb mirror
Prokaryotic (GlobDB) GlobDB r226 306,260 species -c 200 32 GB See the GlobDB website Third-party database
GlobDB r226 306,260 species -c 1000 6.5 GB See the GlobDB website Third-party database
Prokaryotic (Other) OceanDNA 8,466 ocean MAGs -c 200 800 MB OceanDNA-c200-v0.3.syldb mirror
SMAG 21,077 soil MAGs -c 200 2.5 GB SMAG-c200-v0.3.syldb mirror
UHGG v2.0.1 (not dereplicated) 289,232 gut genomes -c 200 26 GB uhgg_all_c200_v0.3.0.syldb mirror Not dereplicated - do not use for profiling
Viral UHGV 171,338 gut vOTUs -c 100 0.4 GB uhgv_c100_dbv1.syldb mirror
UHGV 171,338 gut vOTUs -c 200 0.2 GB uhgv_c200_dbv1.syldb mirror
IMG/VR4.1 2,917,516 viral genomes -c 200 2 GB imgvr_c200_v0.3.0.syldb mirror
Eukaryotic RefSeq Fungi - latest 661 genomes -c 200 750 MB fungi-refseq-2025-10-11-c200-v0.3.syldb mirror
TARA Oceans 713 eukaryotic MAGs/SAGs -c 200 900 MB tara-eukmags-c200-v0.3.syldb mirror

Parameter Guide

  • -c 200: More sensitive, larger file size
  • -c 1000: More efficient, smaller file size, less sensitive
  • -c 100: More sensitive but primarily for smaller genomes.

Note

-c 200 is used by default, so -c 100 must be specified if using a database with -c 100. For example:

sylph profile c100_database c1000_database -c100 -1 read1.fq -2 read2.fq.

Database descriptions

GTDB Databases

The GTDB database is a high-quality, curated taxonomy and database for prokaryotes (archaea and bacteria). We take the dereplicated, species-representative genomes (one genome per species).

Available databases:

  • GTDB r232 database (199,923 species representative genomes) - April 18, 2026

  • GTDB r226 database (143,614 species representative genomes) - April 16, 2025

  • GTDB r220 database (113,104 species representative genomes) - April 24, 2024

  • GTDB r214 database (85,202 species representative genomes) - April 28, 2023

GlobDB - massive prokaryotic catalogue encompassing many other genome sets

GlobDB is a catalogue of > 300,000 prokaryotic genomes/MAGs. Their database is dereplicated at ~96% ANI. GlobDB encompasses 14 other large databases (including GTDB). The sylph database is hosted on their website.

Other prokaryotic databases

Tip

GlobDB encompasses SMAG and OceanDNA. We highly recommend using GlobDB if possible over these databases.

  1. OceanDNA catalogue of 8,466 ocean prokaryotic MAGs, -c 200 (800 MB)
  2. SMAG catalogue of 21,077 soil MAGs, -c 200 (2.5 GB):
  3. UHGG v2.0.1 catalogue of 289,232 gut genomes. Not dereplicated. Do not use for profiling. -c 200 (26 GB):

Viral databases

  1. Unified Human Gut Virome (UHGV) catalog - (171,338 vOTUs from human gut)
    • Note: UHGV is more refined and has better taxonomic annotations than IMG/VR for human gut
  2. Pre-sketched IMG/VR4.1 database for high-confidence vOTU representatives (2,917,516 viral genomes).

Eukaryotic databases.

  1. 661 representative RefSeq fungi genomes (downloaded 2025-10-11), -c 200
  2. 595 representative RefSeq fungi genomes (downloaded 2024-07-25), -c 200
  3. 713 TARA Oceans eukaryotic MAGs/SAGs from Delmont et al., -c 200