What sylph can and cannot do

What can sylph do?

Profile metagenomes: sylph can calculate the abundances of genomes in a metagenomic sample by using a reference database. This is the same type of output as Kraken or MetaPhlAn.
Search genomes against metagenomes: sylph can check if a genome is contained in your sample (e.g. is this E. coli genome in my sample?).
ANI querying: sylph can estimate the containment average nucleotide identity (ANI) of a reference genome to the genomes in your sample.
Use custom reference databases: Eukaryotes, viruses, and any collections of fasta files are ok.
Long-reads are usable: sylph can utilize nanopore or PacBio reads with high precision. A recent study from Oxford Nanopore found that sylph is the most accurate profiling method on their data
Calculate coverage: sylph can estimate the coverage (not just the abundance) of genomes in your database.
Calculate the percentage of reads detected in your database at species level: sylph can check how much of your metagenome is "captured" by the database

Sylph can not:

Map reads. Unlike Kraken, sylph does not classify every read.
Find super low abundance genomes. Sylph requires > 0.01-0.05x coverage at minimum for bacterial genomes. All bacterial genomes need at least a few hundred short-reads.
Reliably find genomes at genus level or higher (if it is not present at species level). If your sample is not well-characterized by the database, sylph may struggle. Note: this also applies to most profilers.
Compare genomes to genomes / metagenomes to metagenomes / contigs to genomes
Work with 16S / ITS data

The below figure summarizes sylph's main steps.

(Panel 1) Reads and reference genomes are broken into k-mers using the sylph sketch option. k-mers are downsampled by a fraction of c, default = 200.
(Panel 1) Using sylph query or sylph profile, the k-mers in each reference genome are checked against the k-mers in the reads.
(Panel 2) Sylph uses statistics to estimate the containment ANI between each reference genome and the metagenomes.
(Panel 3) sylph query: all genomes with high ANI (> 90% default) from the previous step are reported. No abundances.
(Panel 3) sylph profile: calculates abundances and reports the present genomes at species-level using a k-mer remapping algorithm if ANI > 95%.