Custom taxonomies
Creating custom taxonomies
If you're working with custom sylph databases, you can easily create your own taxonomy metadata file. You can look at our pre-built taxonomy files (https://zenodo.org/records/14320496) for examples.
A taxonomic metadata file is simply a two-column TSV file:
- Column 1: the name of your genome's FASTA file:
my_mag.fa
- Column 2: a semicolon-delimited taxonomy string.
d__Archaea;p__Methanobacteriota_B;c__Thermococci;o__Thermococcales;f__Thermococcaceae;g__Thermococcus_A;s__Thermococcus_A alcaliphilus
Note: do not add the t__STRAIN line.
Custom taxonomy example usage case
You obtained two new MAGs: genome1.fa and genome2.fa.gz and you ran GTDB-tk to get their taxonomic annotation. You want to to profile against the new MAGs and the GTDB database.
-
Create a file called
taxonomy.tsvas follows:genome1.fa d__Archaea;(...);s__My new species name` genome2.fa.gz d__Bacteria;(...);g__My genus name;s__My species name2` -
Use
taxonomy.tsvas an argument tosylph-tax taxprof.## profile against gtdb_r220 and your new MAGs sylph profile gtdb_r220.syldb my_custom_mags.syldb ... -o gtdb+mags_output.tsv ## use your new taxonomy.tsv file and GTDB_r220 sylph-tax taxprof gtdb+mags_output.tsv -t GTDB_r220 taxonomy.tsv
Note
The parsing of the taxonomic metadata file is done in the script https://github.com/bluenote-1577/sylph-tax/blob/main/sylph_tax/sylph_to_taxprof.py. Refer to this reference implementation if needed.
Warning
Before v1.7.0 of sylph-tax, for Genbank/RefSeq genomes, filenames had to be dealt with carefully.
- If
_genomicor_ASMis in your genome file name, use the part before_genomicor_ASM.
So for GCF_002863645.1_ASM286364v1_genomic.fna.gz, use GCF_002863645.1 in column 1. This is no longer needed since v.1.7.0.