### We will be offering an R workshop December 18-20, 2019. Learn more.

# Difference between revisions of "Rarefaction"

Line 19: | Line 19: | ||

<math>S_{t}</math> = Total number of OTUs in sample of N total individuals. | <math>S_{t}</math> = Total number of OTUs in sample of N total individuals. | ||

+ | |||

Below is a comparison of the MOTHUR output and the theoretical for a distance of 0.03 from the | Below is a comparison of the MOTHUR output and the theoretical for a distance of 0.03 from the | ||

Amazonian dataset where the total number of sequences was 98, and there were 84 total OTUs | Amazonian dataset where the total number of sequences was 98, and there were 84 total OTUs | ||

− | with 75 singletons, 6 doubletons, 1 tripleton, and 2 quadrupletons. | + | with 75 singletons, 6 doubletons, 1 tripleton, and 2 quadrupletons. |

− | + | ||

− | + | ||

− | + | ||

n Theory DOTUR Diff. %Error n Theory DOTUR Diff. %Error | n Theory DOTUR Diff. %Error n Theory DOTUR Diff. %Error |

## Revision as of 14:40, 14 January 2009

Validate output by making calculations by hand

**Example Calculations**

***.collect and *.rarefaction**

These are the collector's curve and rarefaction curve data for the number of observed OTUs as a function of distance between sequences and the number of sequences sampled. This is merely a count of the number of OTUs observed at any given point in the sampling process.

By theory, the rarefaction curve should match the following expression:

<math>S_n = S_t - \left ( \frac{\sum_{i=1}^{S_t}{N - N_i \choose n} }Template:N \choose n \right ) </math>

**(*.collect and *.rarefaction)**

where,

<math>S_n</math> = Average number of OTUs observed after drawing n individuals.

<math>S_{t}</math> = Total number of OTUs in sample of N total individuals.

Below is a comparison of the MOTHUR output and the theoretical for a distance of 0.03 from the
Amazonian dataset where the total number of sequences was 98, and there were 84 total OTUs
with 75 singletons, 6 doubletons, 1 tripleton, and 2 quadrupletons.

n Theory DOTUR Diff. %Error n Theory DOTUR Diff. %Error 1 1.000 1.000 0.000 0.000 50 45.620 45.634 -0.014 0.031 2 1.996 1.995 0.001 0.049 51 46.461 46.477 -0.016 0.035 3 2.987 2.986 0.001 0.024 52 47.299 47.317 -0.017 0.037 4 3.974 3.971 0.003 0.069 53 48.136 48.154 -0.019 0.038 5 4.956 4.954 0.002 0.045 54 48.970 48.995 -0.025 0.051 6 5.935 5.934 0.001 0.010 55 49.802 49.822 -0.019 0.038 7 6.909 6.909 0.000 0.002 56 50.633 50.647 -0.014 0.028 8 7.880 7.881 -0.001 0.014 57 51.461 51.478 -0.016 0.032 9 8.846 8.847 -0.001 0.013 58 52.288 52.305 -0.018 0.034 10 9.808 9.809 -0.001 0.011 59 53.112 53.126 -0.014 0.027 11 10.767 10.768 -0.001 0.012 60 53.935 53.946 -0.011 0.020 12 11.721 11.721 0.000 0.001 61 54.755 54.767 -0.012 0.022 13 12.672 12.675 -0.003 0.028 62 55.574 55.586 -0.012 0.021 14 13.619 13.623 -0.004 0.031 63 56.391 56.401 -0.009 0.017 15 14.562 14.566 -0.004 0.027 64 57.206 57.213 -0.007 0.012 16 15.502 15.510 -0.008 0.051 65 58.020 58.026 -0.006 0.010 17 16.438 16.446 -0.008 0.046 66 58.832 58.840 -0.008 0.014 18 17.371 17.378 -0.007 0.040 67 59.642 59.647 -0.005 0.009 19 18.300 18.305 -0.005 0.027 68 60.450 60.448 0.002 0.003 20 19.225 19.233 -0.007 0.037 69 61.256 61.260 -0.004 0.006 21 20.148 20.158 -0.010 0.049 70 62.061 62.057 0.004 0.007 22 21.066 21.079 -0.013 0.061 71 62.865 62.861 0.004 0.006 23 21.982 21.995 -0.012 0.057 72 63.666 63.667 0.000 0.000 24 22.894 22.904 -0.010 0.042 73 64.467 64.469 -0.003 0.004 25 23.804 23.814 -0.010 0.042 74 65.265 65.273 -0.008 0.013 26 24.710 24.719 -0.010 0.039 75 66.062 66.070 -0.008 0.012 27 25.613 25.623 -0.011 0.041 76 66.857 66.862 -0.005 0.007 28 26.512 26.521 -0.008 0.032 77 67.651 67.654 -0.002 0.004 30 28.303 28.313 -0.010 0.034 79 69.235 69.241 -0.007 0.010 31 29.194 29.204 -0.010 0.033 80 70.024 70.035 -0.011 0.015 32 30.082 30.093 -0.010 0.034 81 70.812 70.824 -0.011 0.016 33 30.967 30.984 -0.016 0.052 82 71.599 71.612 -0.013 0.018 34 31.850 31.863 -0.013 0.041 83 72.384 72.391 -0.007 0.010 35 32.729 32.745 -0.015 0.047 84 73.168 73.175 -0.007 0.009 36 33.606 33.618 -0.012 0.035 85 73.950 73.958 -0.008 0.010 37 34.481 34.498 -0.017 0.050 86 74.731 74.746 -0.015 0.020 38 35.352 35.371 -0.019 0.052 87 75.511 75.530 -0.019 0.026 39 36.221 36.241 -0.020 0.055 88 76.289 76.302 -0.013 0.017 40 37.088 37.109 -0.021 0.056 89 77.066 77.081 -0.015 0.020 41 37.952 37.973 -0.022 0.057 90 77.842 77.856 -0.014 0.018 42 38.813 38.835 -0.021 0.055 91 78.616 78.619 -0.003 0.004 43 39.672 39.687 -0.014 0.036 92 79.389 79.391 -0.002 0.003 44 40.529 40.542 -0.013 0.032 93 80.161 80.161 0.000 0.000 45 41.383 41.391 -0.008 0.020 94 80.931 80.930 0.002 0.002 46 42.235 42.245 -0.010 0.023 95 81.700 81.700 0.000 0.000 47 43.085 43.093 -0.009 0.020 96 82.468 82.473 -0.005 0.006 48 43.932 43.945 -0.013 0.029 97 83.235 83.237 -0.003 0.003 49 44.777 44.792 -0.015 0.034 98 84.000 84.000 0.000 0.000

**File Samples on the Amazonian Dataset**

- .collect

The first line contains the labels of all the columns. First numsequences which shows the frequency of the observed calculations. The frequency was set to the default of 100, so after each 100 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made. Each additional line starts with the number of sequences sampled followed by the observed calculation at the column's distance. For instance, at distance 0.03, after 98 samples 84.00 were observed.

numsequences unique 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 98 96.00 93.00 89.00 84.00 81.00 73.00 68.00 66.00 59.00 57.00 55.00

- .rarefaction

The first line contains the labels of all the columns. First numsampled which shows the frequency of the observed calculations. The frequency was set to 10, so after each 10 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made, the lci (lower bound of confidence interval) and the hci (higher bound of confidence interval). Note: the entire file is not shown below. Each additional line starts with the number of sequences sampled followed by the observed calculation at the column's distance and the confidence intervals. For instance, at distance 0.01, after 80 samples over the default of 1000 iterations the average observed was 76.74, the lci was 74.65 and the hci was 78.84.

numsampled 0.01 lci hci 0.02 lci hci 0.03 lci hci 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 10 9.95 9.53 10.37 9.87 9.14 10.59 9.81 8.95 10.66 20 19.79 18.90 20.68 19.47 18.07 20.86 19.22 17.52 20.92 30 29.54 28.29 30.80 28.85 26.83 30.87 28.30 25.88 30.72 40 39.20 37.64 40.77 38.01 35.55 40.47 37.10 34.23 39.97 50 48.77 46.87 50.67 47.09 44.30 49.89 45.60 42.37 48.83 60 58.23 56.14 60.31 56.03 53.05 59.01 53.91 50.52 57.30 70 67.54 65.42 69.65 64.78 61.92 67.65 62.03 58.75 65.30 80 76.74 74.65 78.84 73.49 71.06 75.91 69.96 67.00 72.92 90 85.83 84.20 87.45 82.14 80.33 83.94 77.84 75.50 80.17 98 93.00 93.00 93.00 89.00 89.00 89.00 84.00 84.00 84.00