Usage notes
Pre-sketched databases available for download below. All databases work from sylph version 0.3.x onwards.
- Use the Primary links hosted at
http://faust.compbio.cs.cmu.edu
if possible. We provide mirrors on google cloud, but this costs us more money.
Example usage:
# download database
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r226-c200-dbv1.syldb
# profile against database
sylph profile gtdb-r226-c200-dbv1.syldb -1 sample_R1.fq -2 sample_R2.fq -t 30 > results.tsv
Note on taxonomy usage:
Most the databases have associated taxonomies that sylph can utilize. See here for more information on taxonomy integration.
Databases
Type | Name | Genomes | c-parameter | Size | Primary Download Link | Mirror | Notes |
---|---|---|---|---|---|---|---|
Prokaryotic (GTDB) | GTDB r226 | 143,614 species | -c 200 | 18.4 GB | gtdb-r226-c200-dbv1.syldb | mirror | |
GTDB r226 | 143,614 species | -c 1000 | 3.7 GB | gtdb-r226-c1000-dbv1.syldb | mirror | ||
GTDB r220 | 113,104 species | -c 200 | 13.1 GB | gtdb-r220-c200-dbv1.syldb | mirror | ||
GTDB r220 | 113,104 species | -c 1000 | 2.6 GB | gtdb-r220-c1000-dbv1.syldb | mirror | ||
GTDB r214 | 85,202 species | -c 200 | 10 GB | v0.3-c200-gtdb-r214.syldb | mirror | ||
GTDB r214 | 85,202 species | -c 1000 | 2 GB | v0.3-c1000-gtdb-r214.syldb | mirror | ||
Prokaryotic (GlobDB) | GlobDB r226 | 306,260 species | -c 200 | 32 GB | See the GlobDB website | Third-party database | |
GlobDB r226 | 306,260 species | -c 1000 | 6.5 GB | See the GlobDB website | Third-party database | ||
Prokaryotic (Other) | OceanDNA | 8,466 ocean MAGs | -c 200 | 800 MB | OceanDNA-c200-v0.3.syldb | mirror | |
SMAG | 21,077 soil MAGs | 200 | 2.5 GB | SMAG-c200-v0.3.syldb | mirror | ||
UHGG v2.0.1 (not dereplicated) | 289,232 gut genomes | -c 200 | 26 GB | uhgg_all_c200_v0.3.0.syldb | mirror | Not dereplicated - do not use for profiling | |
Viral | UHGV | 171,338 gut vOTUs | -c 100 | 0.4 GB | uhgv_c100_dbv1.syldb | mirror | |
UHGV | 171,338 gut vOTUs | -c 200 | 0.2 GB | uhgv_c200_dbv1.syldb | mirror | ||
IMG/VR4.1 | 2,917,516 viral genomes | -c 200 | 2 GB | imgvr_c200_v0.3.0.syldb | mirror | ||
Eukaryotic | RefSeq Fungi | 595 genomes | -c 200 | 700 MB | fungi-refseq-2024-07-25-c200-v0.3.syldb | mirror | |
TARA Oceans | 713 eukaryotic MAGs/SAGs | -c 200 | 900 MB | tara-eukmags-c200-v0.3.syldb | mirror |
Parameter Guide
- -c 200: More sensitive, larger file size
- -c 1000: More efficient, smaller file size, less sensitive
- -c 100: More sensitive but primarily for smaller genomes.
Note
-c 200
is used by default, so -c 100
must be specified if using a database with -c 100. For example:
sylph profile c100_database c1000_database -c100 -1 read1.fq -2 read2.fq
.
Database descriptions
GTDB Databases
The GTDB database is a high-quality, curated taxonomy and database for prokaryotes (archaea and bacteria). We take the dereplicated, species-representative genomes (one genome per species).
Available databases:
-
GTDB r226 database (143,614 species representative genomes) - April 16, 2025
-
GTDB r220 database (113,104 species representative genomes) - April 24, 2024
-
GTDB r214 database (85,202 species representative genomes) - April 28, 2023
GlobDB - massive prokaryotic catalogue encompassing many other genome sets
GlobDB is a catalogue of > 300,000 prokaryotic genomes/MAGs. Their database is dereplicated at ~96% ANI. GlobDB encompasses 14 other large databases (including GTDB). The sylph database is hosted on their website.
Other prokaryotic databases
Tip
GlobDB encompasses SMAG and a OceanDNA. We highly recommend using GlobDB if possible over these databases.
- OceanDNA catalogue of 8,466 ocean prokaryotic MAGs,
-c 200
(800 MB) - SMAG catalogue of soil 21,077 soil MAGs,
-c 200
(2.5 GB): - UHGG v2.0.1 catalogue of 289,232 gut genomes. Not dereplicated. Do not use for profiling.
-c 200
(26 GB):
Viral databases
- Unified Human Gut Virome (UHGV) catalog - (171,338 vOTUs from human gut)
- Note: is more refined and has better taxonomic annotations than IMG/VR for human gut
- Pre-sketched IMG/VR4.1 database for high-confidence vOTU representatives (2,917,516 viral genomes).
Eukaryotic databases.
- 595 representative RefSeq fungi genomes (downloaded 2024-07-25),
-c 200
(700 MB) - 713 TARA Oceans eukaryotic MAGs/SAGs from Delmont et al.,
-c 200
(900 MB)