### We will be offering an R workshop December 18-20, 2019. Learn more.

# Difference between revisions of "Shannon"

Line 22: | Line 22: | ||

The variance is 0.0020. This gives a lower and upper bound to the 95% confidence interval of | The variance is 0.0020. This gives a lower and upper bound to the 95% confidence interval of | ||

− | 4. | + | 4.20 and 4.51. |

Line 47: | Line 47: | ||

*.shannon | *.shannon | ||

− | The first line contains the labels of all the columns. First numsampled which shows the frequency of the observed calculations. The frequency was set to 10, so after each 10 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made, the lci (lower bound of confidence interval) and the hci (higher bound of confidence interval). Note: the entire file is not shown below. Each additional line starts with the number of sequences sampled followed by the shannon calculation at the column's distance and the confidence intervals. For instance, at distance 0.01, after 80 samples shannon was | + | The first line contains the labels of all the columns. First numsampled which shows the frequency of the observed calculations. The frequency was set to 10, so after each 10 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made, the lci (lower bound of confidence interval) and the hci (higher bound of confidence interval). Note: the entire file is not shown below. Each additional line starts with the number of sequences sampled followed by the shannon calculation at the column's distance and the confidence intervals. For instance, at distance 0.01, after 80 samples shannon was 4.33, the lci was 4.17 and the hci was 4.49. |

+ | |||

+ | |||

+ | numsampled 0.01 lci hci 0.02 lci hci 0.03 lci hci | ||

+ | 1 -0.00 -1.39 1.39 -0.00 -1.39 1.39 -0.00 -1.39 1.39 | ||

+ | 10 2.30 1.86 2.74 2.30 1.86 2.74 2.30 1.86 2.74 | ||

+ | 20 3.00 2.69 3.31 3.00 2.69 3.31 3.00 2.69 3.31 | ||

+ | 30 3.40 3.15 3.65 3.40 3.15 3.65 3.40 3.15 3.65 | ||

+ | 40 3.69 3.47 3.91 3.62 3.40 3.84 3.65 3.43 3.88 | ||

+ | 50 3.88 3.69 4.08 3.86 3.66 4.06 3.86 3.66 4.06 | ||

+ | 60 4.07 3.89 4.25 3.97 3.78 4.16 4.05 3.87 4.23 | ||

+ | 70 4.21 4.04 4.38 4.08 3.90 4.27 4.12 3.95 4.30 | ||

+ | 80 4.33 4.17 4.49 4.20 4.02 4.37 4.24 4.08 4.41 | ||

+ | 90 4.44 4.29 4.59 4.33 4.17 4.50 4.32 4.17 4.48 | ||

+ | 98 4.51 4.37 4.66 4.43 4.28 4.59 4.35 4.20 4.51 |

## Revision as of 16:01, 14 January 2009

Validate output by making calculations by hand

**Example Calculations**

***.shannon**

These files give the classic Shannon-Weaver Index of diversity.

<math>H_{Shannon} = - \sum_{i=1}^{S_{obs}} \frac{S_i}{N} ln \frac{S_i}{N} </math>

For the Amazonian dataset the Shannon Index is 4.35.

To obtain the 95% confidence interval we assume that the variance is normally distributed and can be calculated as

<math>var\left ( H_{Shannon} \right ) = \frac {\sum_{i=1}^{S_{obs}} \frac{S_i}{N} \left ( ln \frac{S_i}{N} \right )^2 - H_{Shannon}^{2}}{N} + \frac{S_{obs} - 1}{2N^{2}}</math>

The variance is 0.0020. This gives a lower and upper bound to the 95% confidence interval of
4.20 and 4.51.

**File Samples on the Amazonian Dataset**

- .sabund

This file contains data for constructing a rank-abundance plot of the OTU data for each distance level. The first column contains the distance and the second is the number of OTUs observed at that distance. The successive values in the row are the number of OTUs that were found once, twice, etc.

unique 2 94 2 0 2 92 3 0.01 2 88 5 0.02 4 84 2 2 10.03 4 75 6 1 20.04 4 69 9 1 2 0.05 4 55 13 3 2 0.06 4 48 14 2 4 0.07 4 44 16 2 4 0.08 7 36 15 4 2 1 0 1 0.09 7 36 12 4 3 0 0 2 0.1 7 35 12 2 3 0 0 3

- .shannon

The first line contains the labels of all the columns. First numsampled which shows the frequency of the observed calculations. The frequency was set to 10, so after each 10 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made, the lci (lower bound of confidence interval) and the hci (higher bound of confidence interval). Note: the entire file is not shown below. Each additional line starts with the number of sequences sampled followed by the shannon calculation at the column's distance and the confidence intervals. For instance, at distance 0.01, after 80 samples shannon was 4.33, the lci was 4.17 and the hci was 4.49.

numsampled 0.01 lci hci 0.02 lci hci 0.03 lci hci 1 -0.00 -1.39 1.39 -0.00 -1.39 1.39 -0.00 -1.39 1.39 10 2.30 1.86 2.74 2.30 1.86 2.74 2.30 1.86 2.74 20 3.00 2.69 3.31 3.00 2.69 3.31 3.00 2.69 3.31 30 3.40 3.15 3.65 3.40 3.15 3.65 3.40 3.15 3.65 40 3.69 3.47 3.91 3.62 3.40 3.84 3.65 3.43 3.88 50 3.88 3.69 4.08 3.86 3.66 4.06 3.86 3.66 4.06 60 4.07 3.89 4.25 3.97 3.78 4.16 4.05 3.87 4.23 70 4.21 4.04 4.38 4.08 3.90 4.27 4.12 3.95 4.30 80 4.33 4.17 4.49 4.20 4.02 4.37 4.24 4.08 4.41 90 4.44 4.29 4.59 4.33 4.17 4.50 4.32 4.17 4.48 98 4.51 4.37 4.66 4.43 4.28 4.59 4.35 4.20 4.51