We will be hosting mothur and R workshops throughout 2018. Learn more.


From mothur
Jump to: navigation, search

The phylo.diversity command requires an input tree file. Two files will be output: .phylo.diversity and (if you set rarefy=T) .rarefaction. To run this tutorial download AbRecovery.zip

Default settings

For example:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups)

or with a count file:

mothur > phylo.diversity(tree=abrecovery.paup.nj, count=abrecovery.count_table)

Execution of phylo.diversity() will generate the file abrecovery.paup.1.phylodiv.summary, which looks like:

Groups    numSampled    phyloDiversity
C    74    1.7782
A    84    2.0530
B    84    2.4740	



The name option allows you to enter a namefile with your treefile.

 mothur > phylo.diversity(tree=abrecovery.phylip.nj, group=abrecovery.groups, name=abrecovery.names)


The count file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence. It can also contain group information.

 mothur > make.table(group=abrecovery.groups, name=abrecovery.names)
 mothur > phylo.diversity(tree=abrecovery.phylip.nj, count=abrecovery.count_table)


The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. The group names are separated by dashes. By default all groups are used.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, groups=A-B)

In the file abrecovery.paup.1.phylodiv.summary you would see something like:

Groups    numSampled    phyloDiversity
A    84    2.0530
B    84    2.4740			


The rarefy parameter allows you to calculate the rarefaction data. The default is false. If you set rarefy=T, 1000 randomizations will be performed by dafault.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, rarefy=T)

In the file abrecovery.paup.1.phylodiv.rarefaction you would see something like:

numSampled	C	A	B	
1	0.3187	0.2612	0.2537	
74	1.7782	1.9327	2.3498	
84	NA	2.0530	2.4740		


The collect parameter allows you to create a collectors curve. The default is false.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T)

In the file abrecovery.paup.1.phylodiv.summary you would see something like:

numSampled	C	A	B	
1	0.3665	0.4120	0.2117	
74	1.7782	1.9658	2.3353	
84	NA	2.0530	2.4740


The summary parameter allows you to create a .summary file. The default is true.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, summary=T)

In the file abrecovery.paup.1.phylodiv you would see something like:

Groups	numSampled	phyloDiversity
C	74	1.7782
A	84	2.0530
B	84	2.4740


For larger datasets you might not be interested in obtaining all of the data for the number of sequences sampled. For instance, if you have 100,000 sequences, you may only want to output the data every 100 sequences. Alternatively, if you only have 100 sequences, you may only want to output all of the data. The default setting is to output data every 100 sequences. By altering the freq option you can set the frequency that the analysis is performed:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T, freq=1)


mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T,freq=10)

or you set set the frequency as a proportion of the total sequences. For example to output after 10%:

 mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T, freq=0.10)

The third command would generate data such as this:

numSampled    C    A    B    
1    0.3073    0.2236    0.1102    
8    1.0329    0.7658    0.8554    
16    1.1919    1.0696    1.3787    
24    1.2975    1.2211    1.4297    
32    1.3998    1.3733    1.6132    
40    1.5672    1.3934    1.7927    
48    1.6548    1.4878    1.8862    
56    1.6727    1.6068    1.9921    
64    1.7378    1.8582    2.1792    
72    1.7707    1.9093    2.3820    
74    1.7782    1.9499    2.4021    
80    NA    2.0316    2.4492    
84    NA    2.0530    2.4740		


To improve the accuracy of the calculations you can change the number of randomizations that are performed using the iters option; the default value is 1,000. Running 10,000 randomization should take 10-times as long as the default:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, rarefy=T, iters=10000)


The scale parameter is used indicate that you want your output scaled to the number of sequences sampled, default = false.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T, scale=t)

In the file abrecovery.paup.1.phylodiv you would see something like:

numSampled	C	A	B	
1	0.3053	0.4170	0.0921	
74	0.0240	0.0248	0.0325	
84	NA	0.0244	0.0295	


The sampledepth parameter allows you to enter the number of sequences you want to sample.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, sampledepth=50)
numSampled	A	B	C
1	0.2298	0.1765	0.2469
50	1.6520	2.1287	1.5145


If you're one of the cool kids, you get to use the processors option, which enables you to reduce the processing time by using multiple processors. You are able to use as many processors as your computer has with the following option:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, processors=2)

Running this command on my laptop doesn't exactly cut the time in half, but it's pretty close. There is no software limit on the number of processors that you can use.