We will be hosting a mothur workshop in December. Learn more.

RDP reference files

From mothur

Version 9

The RDP publicly released version 9 of its training set in March 2012. We have modified the files that they make available on SourceForge to be compatible with mothur. To maintain a consistent 6 taxonomic levels we have removed the various sub-classes, orders and families:

  • 16S rRNA reference (RDP): A collection of 9,665 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 7 (there was no v.8 as far as we are aware).
  • 16S rRNA reference (PDS): The RDP reference with 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales and four 18S rRNA gene sequences added as members of the Eukarya.

You should be aware of several things when using the RDP training set. First, the taxonomies only go to the genus level; therefore, you will only be able to classify your sequences to the genus level. You can modify the training set to include species-level names and may be successful in classifying to the species level. Second, many of these sequences are very poor in quality. Low quality reads have a large number of ambiguous base calls or are very short.


Version 7

The RDP released version 7 of its training set in November 2011. In separate files they provide the reference data for 16S (Bacteria and Archaea) and 18S (fungi) rRNA gene sequences and taxonomy. We have modified these files to be compatible with mothur. To maintain a consistent 6 taxonomic levels we have removed the various sub-classes, orders and families:

  • 16S rRNA reference (RDP): A collection of 9,662 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 6.
  • 16S rRNA reference (PDS): The RDP reference with three sequences reversed and 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales
  • 28S rRNA reference (RDP): A collection of 8506 reference 28S rRNA gene sequences from the Fungi that were curated by the Kuske lab

You should be aware of several things when using the RDP training set. First, the taxonomies only go to the genus level; therefore, you will only be able to classify your sequences to the genus level. You can modify the training set to include species-level names and may be successful in classifying to the species level. Second, many of these sequences are very poor in quality. Low quality reads have a large number of ambiguous base calls or are very short. In the PDS version of the training set we have reversed three sequences that were in the wrong direction.


Version 6

The RDP training set (version 6, released 03/02/2010) consists of 8,422 sequences (8,127 bacterial and 295 archaeal) and is based on Bergey's taxonomic outline. This training set is our modification of the files that they posted to SourceForge. Their archive provides a bunch of other files that mothur will calculate the first time you use the training set. Our archive consists of two files - a fasta-formatted sequence file and a mothur-compatible taxonomy file. The only subtle manipulation we made was to remove the sub-taxonomic levels (e.g. sub-order) and to plug in incertae_sedis levels when a step in the taxonomy was missing. Thus taxonomic level 6 corresponds to the level of genus and level 1 corresponds to the level of kingdom.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox