We will be offering mothur and R workshops throughout 2019. Learn more.

Jack

From mothur
Revision as of 15:38, 14 January 2009 by Westcott (Talk | contribs)

Jump to: navigation, search

Validate output by making calculations by hand

Example Calculations


*.jack

These files give the interpolated Jackknife estimate as describe by Burnham and Overton (1). Since this is a complicated calculation, it makes MOTHUR run much longer. Therefore, if you want the rarefied interpolated Jackknife estimate calculated, you must tell MOTHUR to do so by using the "-j" flag at the command line. If it is not calculated, MOTHUR will still produce the *.r_jack* files, but they will be filled with zeros.


<math>S_{jack,k} = S_{obs} + \sum_{i=1}^k \left ( -1 \right )^{i+1} {k \choose i} n_i </math>


<math>var \left( S_{jack,k} \right ) = \sum_{i=1}^{n_1} \left ( a_{ik} \right )^2 n_i - S_{jack,k} </math>


<math>a_{ik} = \langle \left ( -1 \right )^{i+1} {k \choose i} + 1, i = 1...k, 1, i > k</math>


where,

k = The order of the Jackknife estimate
nt = The number of sequences in the largest OTU 

To determine which order of the estimate to use it is necessary to calculate the test statistics, Tk:


<math>T_k = \frac{S_{jack,k+1} - S_{jack,k}}{\left ( var \left( S_{jack,k+1} - S_{jack,k} | S \right )\right )^2}</math>


<math>var \left( S_{jack,k+1} - S_{jack,k} | S \right ) = \frac {S_{obs}}{S_{obs}-1} \left [ \sum_{i=1}^{n_1} \left ( b_{i}^2 n_i \right ) - \frac{\left ( S_{jack,k+1} - S_{jack,k} \right )^2 }{S_{obs}}\right ]</math>


where,

   <math>b_i = a_{i,k+1}-a_{i,k}</math>


For each Tk value, calculate its two-sided p-value. Find the first k-value where Pk>0.05 and calculate c and d:


<math>c = \frac {0.05 - P_{k-1}}{P_k - P_{k-1}}</math>


<math>d_i = ca_{i,k} + \left( i-c \right )a_{i,k-1}</math>


With c and d, calculate the interpolated Sjack and its standard error:


<math>S_{jack} = \sum_{i=1}^{n_1}d_i n_i</math>


<math>se \left ( S_{jack} \right ) = \left ( \sum_{i=1}^{n_1} \left ( d_{i}^2 n_i\right )-S_{jack}\right )^{0.5}</math>


For the Amazonian dataset, you can calculate the following:

   k     Sj,k    var      Tk       Pk
   1     159     150     13.91  <0.0001
   2     228     450     8.89   <0.0001
   3     292     938     5.77   <0.0001
   4     350     1700    3.36    0.0008
   5     399     2940    1.54    0.1235
   6     434     5250     


The p-value crosses 0.05 between a order of 4 and 5 and you can calculate a c-value of 0.40 and the interpolated Sjack of 369.64 with 95% confidence interval between 278.98 and 460.30. Note that programs like EstiamteS and various microbial ecology papers present either the first and/or second order Jackknife estimate. This method essentially uses a statistical procedure to determine which order results in the minimum bias (error).