# Sharedace

Example Calculations

*.shared.ace

Example calculations below will be performed using data from the Eckburg 70.stool_compare files with an OTU definition of 0.03.

Estimating the richness of shared OTUs between two communities. A Non-parametric richness estimator of the number of shared OTUs between two communities has been developed that is analogous to the ACE (3) single community richness estimator. The $S_{A,B ACE},$ (9), estimator is calculated as:

$S_{A,B ACE} = S_{12 \left ( abund \right )} + \frac {S_{12 \left ( rare \right )}}{c_{12}} + \frac {1}{C_{12}} \left [ f_{\left ( rare \right )1+} {\Gamma}_1 + f_{\left ( rare \right )+1} {\Gamma}_2 + f_{11}{\Gamma}_3 \right ]$

where,

$C_{12} = 1 - \frac {\sum_{i=1}^{S_{12\left ( rare \right )}} {\left \{Y_i I \left ( X_i = 1 \right ) + X_iI \left ( Y_i = 1 \right ) - I \left ( X_i = Y_i = 1 \right ) \right \}}} {T_{11}}$

${\Gamma}_1 = \frac{S_{12 \left (rare \right )} n_{rare} T_{21}}{C_{12}\left( n_{rare} - 1\right)T_{10}T_{11}} - 1$, ${\Gamma}_2 = \frac{S_{12 \left (rare \right )} m_{rare} T_{12}}{C_{12}\left( m_{rare} - 1\right)T_{01}T_{11}} - 1$

${\Gamma}_3 = \left[ \frac{S_{12\left( rare \right)}}{C_{12}}\right ]^2 \frac{n_{rare}m_{rare}T_{22}}{\left(n_{rare}-1\right)\left(m_{rare}-1\right)T_{10}T_{01}T_{11}} - \frac{S_{12 \left( rare \right)}T_{11}}{C_{12}T_{01}T_{10}}-{\Gamma}_1-{\Gamma}2$

$T_{10} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i$, $T_{01} = \sum_{i=1}^{S_{12\left( rare \right)}} Y_i$, $T_{11} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i Y_i$, $T_{21} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i \left( X_i - 1 \right) Y_i$

$T_{12} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i \left( Y_i - 1 \right) Y_i$, $T_{22} = \sum_{i=1}^{S_{12\left( rare \right)}} {X_i \left( X_i - 1 \right) Y_i \left( Y_i - 1 \right)}$

where,

$f_{11}$ = number of shared OTUs with one observed individual in A and B

$f_{1+}, f_{2+}$ = number of shared OTUs with one or two individuals observed in A

$f_{+1}, f_{+2}$ = number of shared OTUs with one or two individuals observed in B

$f_{\left(rare \right)1+}$ = number of OTUs with one individual found in A and less than or equal to 10 in B.

$f_{\left(rare \right)+1}$ = number of OTUs with one individual found in B and less than or equal to 10 in A.

$n_{rare}$ = number of sequences from A that contain less than 10 sequences.

$m_{rare}$ = number of sequences from B that contain less than 10 sequences.

$S_{12\left(rare\right)}$ = number of shared OTUs where both of the communities are represented by less than or equal to 10 sequences.

$S_{12\left(abund\right)}$ = number of shared OTUs where at least one of the communities is represented by more than 10 sequences.

$S_{12\left(obs\right)}$ = number of shared OTUs in A and B.

Calculation of $S_{A,B ACE}.$ is considerably complicated to evaluate. First, we determine that there are 23 rare shared OTUs and 37 abundant shared OTUs. Next, considering only the rare OTUs, we calculate $C_{12}$ as 0.845878. We obtained the following T-values:

$T_{10} = 93$

$T_{01} = 64$

$T_{11} = 279$

$T_{21} = 1444$

${T_{12}} = 988$

$T_{22} = 5440$

Next, calculation of the Γ-values requires knowing $f_{\left(rare \right)1+}, f_{\left(rare \right)+1} \mbox{ and } f_{\left(rare \right)11}$, which were 5, 8, and 2. Also, $n_{rare} \mbox{ and } m_{rare}$ were 185 and 167, respectively. Finally, calculation of the Γ-values gives ${\Gamma}_1=0.530409, {\Gamma}_2 = 0.523308 \mbox{ and } {\Gamma}_3 = 0.151840$. This gives a $S_{A,B ACE}.$ value of 72.3024 as seen below.

File Samples on the Eckburg 70.stool_compare Dataset

• .shared

This file contains the frequency of sequences from each group found in each OTU. Each row consists of the distance being considered, group name, number of OTUS, and the abundance information separated by tabs. The abundance information is as follows. Each subsequent number represents a different OTU so that the number indicates the number of sequences in that group that clustered within that OTU. Note that OTU frequencies can only be compared within a distance definition. Below is a link to the files used in the calculations.

• .shared.ace

The first line contains the labels of all the columns. First sampled which shows the frequency of the $S_{A,B ACE}.$ calculations. The frequency was set to 500, so after each 500 selected the $S_{A,B ACE}.$ is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the $S_{A,B ACE}.$ calculation at the column's distance. For instance, at distance 0.01, after 4392 samples $S_{A,B ACE}.$ was 136.599.

sampled    0.01tissuestool  0.02tissuestool  0.03tissuestool  0.04tissuestool
1		   0		   0		   0		   0
500		44.2676		52.4249		43.9391		26.2499
1000		86.2691		53.7864		55.2556		60.1921
1500		114.238		106.452		45.6638		50.0418
2000		180.391		99.0382		57.2304		47.1769
2500		124.966		92.2403		48.1031		48.5068
3000		114.838		94.2194		56.2644		59.6396
3500		126.609		102.88		59.8571		71.1169
4000		134.213		98.837		56.6823		68.317
4392		136.599		86.5079		72.3024	        62.117