# Chao

Example Calculations

*.chao

The calculations for the Chao1 richness estimates as described by Chao (2, 4) and modified by Colwell (http://viceroy.eeb.uconn.edu/estimates).

$S_{chao1} = S_{obs} + \frac{{n_1}\left ({n_1}-1 \right )}{2\left ({n_2}+1 \right )}$, when n1>0 and n2 >= 0 and when n1 = 0 and n2 = 0.

$S_{chao1} = S_{obs} + \frac{{n_{1}^{2}}}{2{n_2}}$, when n1 = 0 and n2 >= 0.

where,

$S_{chao1}$ = Richness Estimate

$S_{obs}$ = Observed number of species

$n_{1}$ = Number of OTUs with only one sequence

$n_{2}$ = Number of OTUs with only two sequences 12

To calculate the 95% confidence intervals we assume a lognormal distribution of the variance:

$var\left ( S_{chao1} \right ) = \frac{{n_1}\left ({n_1}-1 \right )}{2\left ({n_2}+1 \right )} + \frac{{n_1}\left (2{n_1}-1 \right )^2}{4\left ({n_2}+1 \right )^2} + \frac{{n_1}^2{n_2}\left ({n_1}-1 \right )^2}{4\left ({n_2}+1 \right )^4}$, when n1>0 and n2>0

$var\left ( S_{chao1} \right ) = \frac{{n_1}\left ({n_1}-1 \right )}{2} + \frac{{n_1}\left (2{n_1}-1 \right )^2}{4} - \frac{{n_1}^4}{4S_{chao1}}$, when n1>0 and n2=0

$var\left ( S_{chao1} \right ) = S_{obs} exp \left (-N / S_{obs} \right )\left (1- exp \left (-N / S_{obs} \right )\right )$, when n1=0 and n2>0

$C = exp \left ( \sqrt[1.96]{\ln \left ( 1 + \frac{var\left ( S_{chao1} \right )}{\left ( S_{chao1} - S_{obs}\right )^2 }\right )} \right )$

$LCI_{95%} = S_{obs} + \frac {S_{chao1} - S_{obs}}{C}$

$UCI_{95%} = S_{obs} + C \left ( {S_{chao1} - S_{obs}} \right )$

where,

LCI = Lower bound of confidence interval

UCI = Upper bound of confidence interval

Revisiting the Amazon dataset shown below, at distance 0.03, The $S_{obs}$ was 84, n1 was 75 and n2 was 6 so $S_{Chao1}$ would be 481.19 OTUs. With a G of 12.5 the variance of the estimate is 48,808.59 and the C value would be 2.77. This gives a lower bound to the confidence interval of 227.6 and an upper bound of 1,182.9. The total range of the 95% confidence interval is 955.3.

File Samples on the Amazonian Dataset

• .sabund

This file contains data for constructing a rank-abundance plot of the OTU data for each distance level. The first column contains the distance and the second is the number of OTUs observed at that distance. The successive values in the row are the number of OTUs that were found once, twice, etc.

unique	   2	94	2
0	   2	92	3
0.01	   2	88	5
0.02	   4	84	2	2	1
0.03	   4	75	6	1	2
0.04	   4	69	9	1	2
0.05	   4	55	13	3	2
0.06	   4	48	14	2	4
0.07	   4	44	16	2	4
0.08	   7	36	15	4	2	1	0	1
0.09	   7	36	12	4	3	0	0	2
0.1	   7	35	12	2	3	0	0	3


• .chao

The first line contains the labels of all the columns. First numsampled which shows the frequency of the observed calculations. The frequency was set to 10, so after each 10 selected the observed is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made, the lci (lower bound of confidence interval) and the hci (higher bound of confidence interval). Note: the entire file is not shown below. Each additional line starts with the number of sequences sampled followed by the chao1 calculation at the column's distance and the confidence intervals. For instance, at distance 0.01, after 80 samples chao1 was 754.56, the lci was 238.89 and the hci was 2912.80.

numsampled	0.01	lci	hci	0.02	lci	hci	0.03	lci	hci
1		1.50	1.50	1.50	1.50	1.50	1.50	1.50	1.50	1.50
10		60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00
20		220.00	220.00	220.00	220.00	220.00	220.00	220.00	220.00	220.00
30		480.00	480.00	480.00	480.00	480.00	480.00	480.00	480.00	480.00
40		840.00	840.00	840.00	250.00	71.37	1384.70	395.25	66.40	4671.69
50		619.00	93.54	7343.69	395.56	104.42	2188.94	395.56	104.42	2188.94
60		892.75	124.84	10616.66 375.34	128.48	1451.58	574.44	143.54	3175.95
70		786.67	188.74	4345.72	636.61	158.46	3509.69	407.38	161.18	1277.30
80		754.56	238.89	2912.80	789.67	191.74	4348.72	529.96	204.60	1659.72
90		751.84	281.58	2352.82	1035.22	243.01	5701.27	519.01	224.67	1412.23
98		732.22	308.00	1993.50	1255.67	288.40	6914.90	481.19	227.57	1182.86


References

2. Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11:265-270.

4. Chao, A., M. C. Ma, and M. C. K. Yang. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193-201.