### We will be offering an R workshop December 18-20, 2019. Learn more.

# Difference between revisions of "Sharedace"

Line 4: | Line 4: | ||

'''Example Calculations''' | '''Example Calculations''' | ||

− | '''*. | + | '''*.shared.ace''' |

Line 82: | Line 82: | ||

[[Media:70.stool_compare.zip|70.stool_compare.zip]] | [[Media:70.stool_compare.zip|70.stool_compare.zip]] | ||

− | *. | + | *.shared.ace |

The first line contains the labels of all the columns. First sampled which shows the frequency of the <math>S_{A,B ACE}.</math> calculations. The frequency was set to 500, so after each 500 selected the <math>S_{A,B ACE}.</math> is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the <math>S_{A,B ACE}.</math> calculation at the column's distance. For instance, at distance 0.01, after 4392 samples <math>S_{A,B ACE}.</math> was 136.599. | The first line contains the labels of all the columns. First sampled which shows the frequency of the <math>S_{A,B ACE}.</math> calculations. The frequency was set to 500, so after each 500 selected the <math>S_{A,B ACE}.</math> is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the <math>S_{A,B ACE}.</math> calculation at the column's distance. For instance, at distance 0.01, after 4392 samples <math>S_{A,B ACE}.</math> was 136.599. |

## Revision as of 16:37, 22 January 2009

Validate output by making calculations by hand

**Example Calculations**

***.shared.ace**

Example calculations below will be performed using data from the Eckburg 70.stool_compare files with an OTU definition of 0.03.

Estimating the richness of shared OTUs between two communities. A Non-parametric richness estimator of the number of shared OTUs between two communities has been developed that is analogous to the ACE (3) single community richness estimator. The <math>S_{A,B ACE},</math> (9), estimator is calculated as:

<math>S_{A,B ACE} = S_{12 \left ( abund \right )} + \frac {S_{12 \left ( rare \right )}}{c_{12}} + \frac {1}{C_{12}} \left [ f_{\left ( rare \right )1+} {\Gamma}_1 + f_{\left ( rare \right )+1} {\Gamma}_2 + f_{11}{\Gamma}_3 \right ]</math>

where,

<math>C_{12} = 1 - \frac {\sum_{i=1}^{S_{12\left ( rare \right )}} {\left \{Y_i I \left ( X_i = 1 \right ) + X_iI \left ( Y_i = 1 \right ) - I \left ( X_i = Y_i = 1 \right ) \right \}}} {T_{11}}</math>

<math>{\Gamma}_1 = \frac{S_{12 \left (rare \right )} n_{rare} T_{21}}{C_{12}\left( n_{rare} - 1\right)T_{10}T_{11}} - 1</math>, <math>{\Gamma}_2 = \frac{S_{12 \left (rare \right )} m_{rare} T_{12}}{C_{12}\left( m_{rare} - 1\right)T_{01}T_{11}} - 1</math>

<math>{\Gamma}_3 = \left[ \frac{S_{12\left( rare \right)}}{C_{12}}\right ]^2 \frac{n_{rare}m_{rare}T_{22}}{\left(n_{rare}-1\right)\left(m_{rare}-1\right)T_{10}T_{01}T_{11}} - \frac{S_{12 \left( rare \right)}T_{11}}{C_{12}T_{01}T_{10}}-{\Gamma}_1-{\Gamma}2</math>

<math>T_{10} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i </math>, <math>T_{01} = \sum_{i=1}^{S_{12\left( rare \right)}} Y_i </math>, <math>T_{11} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i Y_i </math>, <math>T_{21} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i \left( X_i - 1 \right) Y_i </math>

<math>T_{12} = \sum_{i=1}^{S_{12\left( rare \right)}} X_i \left( Y_i - 1 \right) Y_i </math>, <math>T_{22} = \sum_{i=1}^{S_{12\left( rare \right)}} {X_i \left( X_i - 1 \right) Y_i \left( Y_i - 1 \right)} </math>

where,

<math>f_{11}</math> = number of shared OTUs with one observed individual in A and B

<math>f_{1+}, f_{2+}</math> = number of shared OTUs with one or two individuals observed in A

<math>f_{+1}, f_{+2}</math> = number of shared OTUs with one or two individuals observed in B

<math>f_{\left(rare \right)1+}</math> = number of OTUs with one individual found in A and less than or equal to 10 in B.

<math>f_{\left(rare \right)+1}</math> = number of OTUs with one individual found in B and less than or equal to 10 in A.

<math>n_{rare}</math> = number of sequences from A that contain less than 10 sequences.

<math>m_{rare}</math> = number of sequences from B that contain less than 10 sequences.

<math>S_{12\left(rare\right)}</math> = number of shared OTUs where both of the communities are represented by less than or equal to 10 sequences.

<math>S_{12\left(abund\right)}</math> = number of shared OTUs where at least one of the communities is represented by more than 10 sequences.

<math>S_{12\left(obs\right)}</math> = number of shared OTUs in A and B.

Calculation of <math>S_{A,B ACE}.</math> is considerably complicated to evaluate. First, we determine that there are 23 rare shared OTUs and 37 abundant shared OTUs. Next, considering only the rare OTUs, we calculate <math>C_{12}</math> as 0.845878. We obtained the following T-values:

<math>T_{10} = 93</math>

<math>T_{01} = 64</math>

<math>T_{11} = 279</math>

<math>T_{21} = 1444</math>

<math>{T_{12}} = 988</math>

<math>T_{22} = 5440</math>

Next, calculation of the Γ-values requires knowing <math>f_{\left(rare \right)1+}, f_{\left(rare \right)+1} \mbox{ and } f_{\left(rare \right)11}</math>, which were 5, 8, and 2. Also, <math> n_{rare} \mbox{ and } m_{rare}</math> were 185 and 167, respectively. Finally, calculation of the Γ-values gives <math>{\Gamma}_1=0.530409, {\Gamma}_2 = 0.523308 \mbox{ and } {\Gamma}_3 = 0.151840</math>. This gives a <math>S_{A,B ACE}.</math> value of 72.3024 as seen below.

**File Samples on the Eckburg 70.stool_compare Dataset**

- .shared

This file contains the frequency of sequences from each group found in each OTU. Each row consists of the distance being considered, group name, number of OTUS, and the abundance information separated by tabs. The abundance information is as follows. Each subsequent number represents a different OTU so that the number indicates the number of sequences in that group that clustered within that OTU. Note that OTU frequencies can only be compared within a distance definition. Below is a link to the files used in the calculations.

- .shared.ace

The first line contains the labels of all the columns. First sampled which shows the frequency of the <math>S_{A,B ACE}.</math> calculations. The frequency was set to 500, so after each 500 selected the <math>S_{A,B ACE}.</math> is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the <math>S_{A,B ACE}.</math> calculation at the column's distance. For instance, at distance 0.01, after 4392 samples <math>S_{A,B ACE}.</math> was 136.599.

sampled 0.01tissuestool 0.02tissuestool 0.03tissuestool 0.04tissuestool 1 0 0 0 0 500 44.2676 52.4249 43.9391 26.2499 1000 86.2691 53.7864 55.2556 60.1921 1500 114.238 106.452 45.6638 50.0418 2000 180.391 99.0382 57.2304 47.1769 2500 124.966 92.2403 48.1031 48.5068 3000 114.838 94.2194 56.2644 59.6396 3500 126.609 102.88 59.8571 71.1169 4000 134.213 98.837 56.6823 68.317 4392 136.599 86.5079 72.3024 62.117