Pre-built databases
Pre-sketched databases available for download below. All databases work from sylph version 0.3.x onwards.
- Use the
http://faust.compbio.cs.cmu.edu
links if possible. We provide mirrors on google cloud, but this costs us more money.
Example usage:
# download database
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r220-c200-dbv1.syldb
# profile against database
sylph profile gtdb-r220-c200-dbv1.syldb -1 sample_R1.fq -2 sample_R2.fq -t 30 > results.tsv
Note on taxonomy usage:
Most the databases have associated taxonomies that sylph can utilize. See here for more information on taxonomy integration.
GTDB Databases
GTDB r220 database (113,104 species representative genomes) - 24th April, 2024
-c 200
, more sensitive database (13.1 GB)-c 1000
more efficient, less sensitive database (2.6 GB)
GTDB r214 database (85,202 species representative genomes) - 28th April, 2023
-c 200
, more sensitive database (10 GB)-c 1000
more efficient, less sensitive database (2 GB)
Other prokaryotic databases
- OceanDNA catalogue of 8,466 ocean prokaryotic MAGs,
-c 200
(800 MB) - SMAG catalogue of soil 21,077 soil MAGs,
-c 200
(2.5 GB):- http://faust.compbio.cs.cmu.edu/sylph-stuff/SMAG-c200-v0.3.syldb (primary + preferred)
- https://storage.googleapis.com/sylph-stuff/SMAG-c200-v0.3.syldb (mirror)
- UHGG v2.0.1 catalogue of 289,232 gut genomes. Not dereplicated. Do not use for profiling.
-c 200
(26 GB):
Viral databases
Pre-sketched IMG/VR4.1 database for high-confidence vOTU representatives (2,917,516 viral genomes).
-c 200
(2GB)
Eukaryotic databases.
-
595 representative RefSeq fungi genomes (downloaded 2024-07-25),
-c 200
(700 MB) -
713 TARA Oceans eukaryotic MAGs/SAGs from Delmont et al.,
-c 200
(900 MB)