# Sharedchao

Example Calculations

*.sharedChao

Example calculations below will be performed using data from the Eckburg 70.stool_compare files with an OTU definition of 0.03.

Estimating the richness of shared OTUs between two communities. A Non-parametric richness estimator of the number of shared OTUs between two communities has been developed that is analogous to the Chao1 (6) single community richness estimator. The $S_{A,B Chao}$ (11) estimator is calculated as:

$S_{A,B Chao} = S_{12 \left ( Obs \right )} + f_{11} \frac {f_{1+}f_{+1}}{4f_{2+}f_{+2}} + \frac{f_{1+}^{2}}{2f_{2+}} + \frac{f_{+1}^{2}}{2f_{+2}}$,

where,

$f_{11}$ = number of shared OTUs with one observed individual in A and B

$f_{1+}, f_{2+}$ = number of shared OTUs with one or two individuals observed in A

$f_{+1}, f_{+2}$ = number of shared OTUs with one or two individuals observed in B

$S_{12\left(obs\right)}$ = number of shared OTUs in A and B.

Calculation of the $S_{A,B Chao}$ requires the number of OTUs where only one sequence was observed from each library, $f_{11}$. For our example case, $f_{11}$ is 2. Plugging the f-values and $S_{12\left( obs \right )}\mbox{into } S_{A,B Chao}$ yields a value of 76.6667, which matches the value in $S_{A,B Chao}$ of the table above.

File Samples on the Eckburg 70.stool_compare Dataset

• .shared

This file contains the frequency of sequences from each group found in each OTU. Each row consists of the distance being considered, group name, number of OTUS, and the abundance information separated by tabs. The abundance information is as follows. Each subsequent number represents a different OTU so that the number indicates the number of sequences in that group that clustered within that OTU. Note that OTU frequencies can only be compared within a distance definition. Below is a link to the file used in the calculations.

• .sharedChao

The first line contains the labels of all the columns. First sampled which shows the frequency of the $S_{A,B Chao}$ calculations. The frequency was set to 500, so after each 500 selected the $S_{A,B Chao}$ is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the $S_{A,B Chao}$ calculation at the column's distance. For instance, at distance 0.01, after 4392 samples $S_{A,B Chao}$ was 152.768.

sampled    0.00tissuestool	  0.01tissuestool	    0.02tissuestool		0.03tissuestool
1		   0		         0		          0		                0
500		52.1857		      50.25		       67.5		             43.5417
1000		86.0812		      83.9		       57.1833		             62
1500		116.479		      88.9489		       78.6136		             56.0167
2000		171.833		     113.836		      116.533		             66.8958
2500		208.426		     120.216		      105.983		             57.5714
3000		172.043		     101.769		       92.8631		             69.25
3500		130.987		     210.038		      118.833		            104.517
4000		210.174		     141.034		      107.885		             59.8214
4392		233.296		     152.768		      110.364		             76.6667