We will be offering an R workshop December 18-20, 2019. Learn more.


From mothur
Revision as of 17:18, 15 January 2009 by Westcott (Talk | contribs)

Jump to: navigation, search

Validate output by making calculations by hand

Example Calculations


Estimating the fraction of sequences that belong to shared OTUs.

Just as the Chao1 richness estimator is a function of the number of OTUs observed once or twice in a sample (6), the estimators of the fraction of sequences in shared OTUs is a function of the number of shared OTUs that are observed at least once or twice in the community being analyzed (8, 7). <math>{S_{Jest}}</math> is essentially a form of the equation for <math>J_{abund}</math> that is not corrected for the presence of unsampled OTUs.

<math>S_{Jest} = \frac{S_{A,B Chao}}{S1_{chao} + S2_{chao} - S_{A,B Chao}}</math>,


<math>S_{A,B Chao}</math> = estimated number of OTUs shared between A and B using the <math>S_{A,B Chao}</math> estimator.

<math>S1_{chao}</math> = number of OTUs estimated in A using the Chao estimator.

<math>S2_{chao}</math> = number of OTUs estimated in B using the Chao estimator.

At distance 0.03, the <math>{S_{Jest}}</math> value is 0.544756.

File Samples on the Eckburg 70.stool_compare Dataset

  • .shared

This file contains the frequency of sequences from each group found in each OTU. Each row consists of the distance being considered, group name, number of OTUS, and the abundance information separated by tabs. The abundance information is as follows. Each subsequent number represents a different OTU so that the number indicates the number of sequences in that group that clustered within that OTU. Note that OTU frequencies can only be compared within a distance definition. Below is a link to the file used in the calculations.


  • .sharedJest

The first line contains the labels of all the columns. First sampled which shows the frequency of the <math>{S_{Jest}}</math> calculations. The frequency was set to 500, so after each 500 selected the <math>{S_{Jest}}</math> is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the <math>{S_{Jest}}</math> calculation at the column's distance. For instance, at distance 0.01, after 4392 samples <math>{S_{Jest}}</math> was 0.495593.

sampled   uniquetissuestool	0.00tissuestool	  0.01tissuestool   0.02tissuestool	0.03tissuestool		
1		0		        0		 0		   0		        0		
500		0.0103567		0.125911	 0.588073	   0.541865		0.67526			
1000		0.0109122		0.0621116	 0.365992	   1.13299		0.380586		
1500		0.00778901		0.101306	 3.77652	   1.28192		0.607009		
2000		0.00661623		0.128395	 0.470845	   0.902212		0.589304		
2500		0.00526837		0.165431	 0.471355	   0.733738		0.542975		
3000		0.00404608		0.207792	 0.739707	   0.712481		0.553784		
3500		0.00543292		0.154316	 0.674743	   0.601285		0.52953		
4000		0.00587551		0.180477	 0.665167	   0.652636		0.485973		
4392		0.00874493		0.220945	 0.495593	   0.659318		0.544756		


6. Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11:265-270.

7. Chao, A., R. L. Chazdon, R. K. Colwell, and T. J. Shen. 2006. Abundance-based similarity indices when there estimation when there are unseen species in samples. Biometrics 62:361-371.

8. Chao, A., R. L. Chazdon, R. K. Colwell, and T. J. Shen. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecol. Lett. 8:148-159.