We will be offering mothur and R workshops throughout 2019. Learn more.

Unifrac.unweighted

From mothur
Revision as of 11:01, 21 April 2010 by Westcott (Talk | contribs)

Jump to: navigation, search

The unifrac.unweighted comand implements the unweighted UniFrac algorithm. The unifrac.weighted command implements the weighted version of the command. Both of these methods are available through the UniFrac website. The UniFrac methods are generic tests that describes whether two or more communities have the same structure. The significance of the test statistic can only indicate the probability that the communities have the same structure by chance. The value does not indicate a level of similarity. The files that we discuss in this tutorial can be obtained by downloading the AbRecovery.zip file and decompressing it.



Default settings

The unifrac.unweighted() command can only be run after a successful read.tree() command execution. To start this tutorial, enter the following command:

mothur > read.tree(tree=abrecovery.paup.nj, group=abrecovery.groups)

By default, the unifrac.unweighted() command will carry out the unweighted UniFrac test on each tree in the tree file. This algorithm can compare more than two treatments at a time. Therefore, this test will determine whether any of the groups within the group file have a significantly different structure than the other groups. Execute the command with default settings:

mothur > unifrac.unweighted()

This will produce:

Tree#	Groups	UWScore	UWSig
1	A-B-C	0.6818	<0.001

This means that the tree had a score of 0.6818 and that the significance of the score (i.e. p-value) was less than 1 in 1,000. These data are also in the abrecovery.paup.nj.uwsummary file. Looking at the file abrecovery.paup.nj1.unweighted you will see a table with the score of your tree with the different possible pairwise comparisons and the distribution information for the 1,000 randomly labelled trees that were constructed:

A-B-CScore	A-B-CRandFreq	A-B-CRandCumul
0.420345	0.001		1.000
0.423251	0.001		0.999
0.423328	0.001		0.998
...
0.456161	0.001		0.579
0.456181	0.001		0.578
0.456275	0.001		0.577
...
0.515506	0.001		0.003
0.516054	0.001		0.002
0.517867	0.001		0.001

As the output to the screen indicated, this file tells you that if your comparison between A and B had a score 0.6818 then there were no randomly-labelled trees with a score greater than or equal to 0.6818; therefore, your p-value would be less than 0.001 (one divided by the number of randomizations). Alternatively, if your comparison between A and B had a score of 0.456181, this table would tell you that 1 of the 1,000 random trees had a score of 0.456181 and that 578 of the 1,000 random trees (i.e. P=0.578) had a score of 0.456181 or larger.

If instead of loading abrecovery.paup.nj you had instead loaded abrecovery.paup.bnj and run unifrac.unweighted():

mothur > read.tree(tree=abrecovery.paup.bnj, group=abrecovery.groups)
mothur > unifrac.unweighted()

This will generate the abrecovery.paup.nj.uwsummary file, but it will also generate 1,000 *.unweighted files (one for each tree you supplied) with contents similar to that observed in abrecovery.paup.nj1.unweighted.


Options

groups

Having demonstrated that the community structure for at least one of the three groups in the abrecovery.groups file were significantly different from the other two, you would now like to do pairwise comparisons. Note: You should not do pairwise comparisons if there is not a significant difference at the global level. A conservative method to determine the significance of your pairwise p-values you could divide the overall significance threshold (e.g. typically 0.05) by the number of comparisons that you will carry out. To do all of the possible pairwise comparisons you will set the groups option:

mothur > read.tree(tree=abrecovery.paup.nj, group=abrecovery.groups)
mothur > unifrac.unweighted(groups=all)
Tree#	Groups	UWScore	UWSig
1	A-B	0.6986	<0.001
1	A-C	0.7035	<0.001
1	B-C	0.7429	<0.001
1	A-B-C	0.6818	<0.001

or you could enter the following to get the same output:

mothur > unifrac.unweighted(groups=A-B-C)


Alternatively, to only compare two of the three groups you would enter:

mothur > unifrac.unweighted(groups=A-B)
Tree#	Groups	UWScore	UWSig
1	A-B	0.6986	<0.001

or

mothur > unifrac.unweighted(groups=A-C)
Tree#	Groups	UWScore	UWSig
1	A-C	0.7035	<0.001

or

mothur > unifrac.unweighted(groups=B-C)
Tree#	Groups	UWScore	UWSig
1	B-C	0.7429	<0.001

All of this tells you that the three groups harbor significantly different community structures from each other since the p-values are all less than 0.01667 (i.e. 0.05/3).

iters

If you run the unifrac.weighted() command multiple times, you will notice that while the score for your user tree doesn't change, it's significance may change some. This is because the testing procedure is based on a randomization process that becomes more accurate as you increase the number of randomizations. By default, unifrac.weighted() will do 1,000 randomizations. You can change the number of iterations with the iters option as follows:

mothur > unifrac.unweighted(iters=10000)

random

The random parameter allows you to shut off the comparison to random trees. The default is true, meaning compare your trees with randomly generated trees.

mothur > unifrac.unweighted(random=false)

distance

The distance parameter allows you to create a distance file from the results generated. The default is false.

mothur > unifrac.unweighted(distance=true)


name

The name parameter allows you to enter a namesfile with your tree.

mothur > read.tree(tree=abrecovery.paup.bnj, group=abrecovery.groups, abrecovery.names)
mothur > unifrac.unweighted()

Finer points

Missing names in tree or group file

If you are missing a name from your tree or groups file mothur will warn you and return to the mothur prompt. Be sure that you don't have spaces in your sequence or group names.


Differences in implementation

The UniFrac website provides extensive resources for generating tree and PCA plots to compare sequence collections based on the UniFrac statistic. We have shown Schloss (2008) that this is a problematic approach. The statistic is not necessarily proportional to the true similarity of the communities and may become a worse estimate of similarity with increased sampling. On philosophical and scientific grounds, we will not provide any downstream analysis that uses the UniFrac statistic. The statistic is sound for hypothesis testing, but not for comparative analyses.