We will be hosting a mothur workshop in December. Learn more.

Silva reference files

From mothur

Release 119

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. In our published opinion, this is the best reference alignment out there - far superior to the greengenes alignment. In a shift from our previous version of the SILVA references, we are now providing the SEED database, the full-length sequences available from the NR SILVA database, and a SILVA aligned version of the gold database that is used for reference-based chimera detection. We have prepared a README document where you can read about the process that we used to generate these references.

  • Full length sequences and taxonomy references (137879 bacteria, 3155 arches, and 12273 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~7.2 GB and the compressed version is 249 MB.
  • Recreated SEED database (12244 bacteria, 207 arches, and 2558 eukarya sequences). The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v119) that have a 100% quality score to the SEED alignment (field 'align_ident_slv' in the arb database) and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing a composite dataset for bacterial, archaeal, and eukaryotic sequences. The uncompressed version is ~700 MB and the compressed version is 25 MB.

Release 102

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v102) that have a 100% quality score to the SEED alignment and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing separate datasets for bacterial, archaeal, and eukaryotic sequences. Within each reference set are the aligned sequence file (e.g. silva.bacteria.fasta), an unaligned sequence file (e.g. nogap.bacteria.fasta), and taxonomy outlines (e.g. silva.bacteria.silva.tax):

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox