# Difference between revisions of "Jack"

Example Calculations

*.jack

These files give the interpolated Jackknife estimate as describe by Burnham and Overton (1).

$S_{jack,k} = S_{obs} + \sum_{i=1}^k \left ( -1 \right )^{i+1} {k \choose i} n_i$

$var \left( S_{jack,k} \right ) = \sum_{i=1}^{n_1} \left ( a_{ik} \right )^2 n_i - S_{jack,k}$

$a_{ik} = \langle \left ( -1 \right )^{i+1} {k \choose i} + 1, i = 1...k, 1, i > k$

where,

k = The order of the Jackknife estimate

$n_t$ = The number of sequences in the largest OTU.

To determine which order of the estimate to use it is necessary to calculate the test statistics, $T_k$:

$T_k = \frac{S_{jack,k+1} - S_{jack,k}}{\left ( var \left( S_{jack,k+1} - S_{jack,k} | S \right )\right )^2}$

$var \left( S_{jack,k+1} - S_{jack,k} | S \right ) = \frac {S_{obs}}{S_{obs}-1} \left [ \sum_{i=1}^{n_1} \left ( b_{i}^2 n_i \right ) - \frac{\left ( S_{jack,k+1} - S_{jack,k} \right )^2 }{S_{obs}}\right ]$

where,

$b_i = a_{i,k+1}-a_{i,k}$

For each $T_k$ value, calculate its two-sided p-value. Find the first k-value where $P_k$>0.05 and calculate c and d:

$c = \frac {0.05 - P_{k-1}}{P_k - P_{k-1}}$

$d_i = ca_{i,k} + \left( i-c \right )a_{i,k-1}$

With c and d, calculate the interpolated $S_{jack}$ and its standard error:

$S_{jack} = \sum_{i=1}^{n_1}d_i n_i$

$se \left ( S_{jack} \right ) = \left ( \sum_{i=1}^{n_1} \left ( d_{i}^2 n_i\right )-S_{jack}\right )^{0.5}$

For the Amazonian dataset, you can calculate the following:

   k     Sj,k    var      Tk       Pk
1     159     150     13.91  <0.0001
2     228     450     8.89   <0.0001
3     292     938     5.77   <0.0001
4     350     1700    3.36    0.0008
5     399     2940    1.54    0.1235
6     434     5250


The p-value crosses 0.05 between a order of 4 and 5 and you can calculate a c-value of 0.40 and the interpolated $S_{jack}$ of 369.64 with 95% confidence interval between 278.98 and 460.30 at distance 0.03. Note that programs like EstiamteS and various microbial ecology papers present either the first and/or second order Jackknife estimate. This method essentially uses a statistical procedure to determine which order results in the minimum bias (error).

File Samples on the Amazonian Dataset

• .sabund

This file contains data for constructing a rank-abundance plot of the OTU data for each distance level. The first column contains the distance and the second is the number of OTUs observed at that distance. The successive values in the row are the number of OTUs that were found once, twice, etc.

unique	   2	94	2
0	   2	92	3
0.01	   2	88	5
0.02	   4	84	2	2	1
0.03	   4	75	6	1	2
0.04	   4	69	9	1	2
0.05	   4	55	13	3	2
0.06	   4	48	14	2	4
0.07	   4	44	16	2	4
0.08	   7	36	15	4	2	1	0	1
0.09	   7	36	12	4	3	0	0	2
0.1	   7	35	12	2	3	0	0	3


• .jack

The first line contains the labels of all the columns. First numsampled which shows the frequency of the observed calculations. The frequency was set to 10, so after each 10 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made, the lci (lower bound of confidence interval) and the hci (higher bound of confidence interval). Note: the entire file is not shown below. Each additional line starts with the number of sequences sampled followed by the jackknife calculation at the column's distance and the confidence intervals. For instance, at distance 0.01, after 80 samples jackknife was 730.34, the lci was 458.44 and the hci was 1002.23.

numsampled	0.01	lci	hci	0.02	lci	hci	0.03	lci	hci
1	        2.00	-0.77	4.77	2.00	-0.77	4.77	2.00	-0.77	4.77
10		20.00	11.23	28.77	20.00	11.23	28.77	20.00	11.23	28.77
20		40.00	27.60	52.40	40.00	27.60	52.40	40.00	27.60	52.40
30		60.00	44.82	75.18	60.00	44.82	75.18	60.00	44.82	75.18
40		80.00	62.47	97.53	248.46	143.29	353.64	433.08	231.11	635.06
50		681.34	371.78	990.90	392.84	230.26	555.42	392.84	230.26	555.42
60		987.04	546.63	1427.46	398.62	242.78	554.46	570.40	338.19	802.62
70		781.20	467.09	1095.31	362.23	262.78	461.68	434.80	274.17	595.44
80		730.34	458.44	1002.23	466.63	335.66	597.61	372.02	273.01	471.03
90		723.29	470.37	976.20	551.43	405.32	697.53	409.60	300.44	518.75
98		705.48	471.31	939.66	623.98	464.53	783.42	369.64	278.98	460.30


References

1. Burnham, K. P., and W. S. Overton. 1979. Robust estimation of population size when capture probabilities vary among animals. Ecology 60:927-936.