# Jest

Example Calculations

*.sharedJest

Estimating the fraction of sequences that belong to shared OTUs.

Just as the Chao1 richness estimator is a function of the number of OTUs observed once or twice in a sample (6), the estimators of the fraction of sequences in shared OTUs is a function of the number of shared OTUs that are observed at least once or twice in the community being analyzed (8, 7). ${S_{Jest}}$ is essentially a form of the equation for $J_{abund}$ that is not corrected for the presence of unsampled OTUs.

$S_{Jest} = \frac{S_{A,B Chao}}{S1_{chao} + S2_{chao} - S_{A,B Chao}}$,

where,

$S_{A,B Chao}$ = estimated number of OTUs shared between A and B using the $S_{A,B Chao}$ estimator.

$S1_{chao}$ = number of OTUs estimated in A using the Chao estimator.

$S2_{chao}$ = number of OTUs estimated in B using the Chao estimator.

At distance 0.03, the ${S_{Jest}}$ value is 0.544756.

File Samples on the Eckburg 70.stool_compare Dataset

• .shared

This file contains the frequency of sequences from each group found in each OTU. Each row consists of the distance being considered, group name, number of OTUS, and the abundance information separated by tabs. The abundance information is as follows. Each subsequent number represents a different OTU so that the number indicates the number of sequences in that group that clustered within that OTU. Note that OTU frequencies can only be compared within a distance definition. Below is a link to the file used in the calculations.

• .sharedJabund

The first line contains the labels of all the columns. First sampled which shows the frequency of the $S_{Jest}$ calculations. The frequency was set to 500, so after each 500 selected the $S_{Jest}$ is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the $S_{Jest}$ calculation at the column's distance. For instance, at distance 0.01, after 4392 samples $S_{Jest}$ was 0.495593.

sampled   uniquetissuestool	0.00tissuestool	  0.01tissuestool   0.02tissuestool	0.03tissuestool
1		0		        0		 0		   0		        0
500		0.0103567		0.125911	 0.588073	   0.541865		0.67526
1000		0.0109122		0.0621116	 0.365992	   1.13299		0.380586
1500		0.00778901		0.101306	 3.77652	   1.28192		0.607009
2000		0.00661623		0.128395	 0.470845	   0.902212		0.589304
2500		0.00526837		0.165431	 0.471355	   0.733738		0.542975
3000		0.00404608		0.207792	 0.739707	   0.712481		0.553784
3500		0.00543292		0.154316	 0.674743	   0.601285		0.52953
4000		0.00587551		0.180477	 0.665167	   0.652636		0.485973
4392		0.00874493		0.220945	 0.495593	   0.659318		0.544756


References

6. Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11:265-270.

7. Chao, A., R. L. Chazdon, R. K. Colwell, and T. J. Shen. 2006. Abundance-based similarity indices when there estimation when there are unseen species in samples. Biometrics 62:361-371.

8. Chao, A., R. L. Chazdon, R. K. Colwell, and T. J. Shen. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecol. Lett. 8:148-159.