### We will be offering an R workshop December 18-20, 2019. Learn more.

# Mothur manual

## MOTHUR Manual

**Introduction**

MOTHUR is a computer program that uses a distance matrix as the input file and assigns sequences to operational taxonomic units for every distance level that can be used to form OTUs using either the nearest, furthest, or average neighbor clustering algorithms. These are also called, single-linkage, complete-linkage, and UPGMA, respectively. Once sequences are assigned to OTUs, the frequency data for each distance level is used to construct rarefaction and collector's curves for the number of species observed, Shannon's and Simpson's diversity index, and Chao1, ACE, Jackknife, and Bootstrap richness estimators as a function of sampling effort and the distance used to define an OTU. MOTHUR also uses non-parametric estimators to estimate similarity between communities based on membership and structure. MOTHUR determines the number individuals in each community that were sampled for each OTU. Next it calculates collector's curves for the fraction of shared OTUs between the two communities (with and without correcting for unsampled individuals), the Jaccard and Sorenson Indices, and the richness of OTUs shared between the two communities. Standard error values are calculated for entire sequence collection. MOTHUR is freely available as C++ source code and as a windows executable.

This manual is designed to achieve five goals:

1. Describe the difference between each of the three sequence assignment algorithms. 2. Show how to use MOTHUR 3. Describe output files 4. Validate output by making calculations by hand 5. Answer frequently asked questions 6. Tutorials

If you have any questions, complaints, or praise, please do not hesitate to contact Dr. Patrick D. Schloss at pschloss@microbio.umass.edu

**References**

1. Burnham, K. P., and W. S. Overton. 1979. Robust estimation of population size when capture probabilities vary among animals. Ecology 60:927-936.

2. Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11:265-270.

3. Chao, A., and S. M. Lee. 1992. Estimating the number of classes via sample coverage. J Am Stat Assoc 87:210-217.

4. Chao, A., M. C. Ma, and M. C. K. Yang. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193-201.

5. Smith, E. P., and G. van Belle. 1984. Nonparametric estimation of species richness. Biometrics 40:119-129.

6. Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11:265-270.

7. Chao, A., R. L. Chazdon, R. K. Colwell, and T. J. Shen. 2006. Abundance-based similarity indices when there estimation when there are unseen species in samples. Biometrics 62:361-371.

8. Chao, A., R. L. Chazdon, R. K. Colwell, and T. J. Shen. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecol. Lett. 8:148-159.

9. Chao, A., W. H. Hwang, Y. C. Chen, and C. Y. Kuo. 2000. Estimating the number of shared species in two communities. Stat. Sinica. 10:227-246.

11. Chao, A., T. J. Shen, and W. H. Hwang. 2006. The applications of Laplace's boundary-mode approximations to estimate species richness and shared species richness. Aust. N. Z. J. Stat. 48:117-128.

12. Yue, J. C., and M. K. Clayton. 2005. A similarity measure based on species proportions. Commun. Stat. Theor. M. 34:2123-2131.