We will be offering mothur and R workshops throughout 2019. Learn more.

Unifrac.weighted

From mothur
Revision as of 11:03, 21 April 2010 by Westcott (Talk | contribs)

Jump to: navigation, search

The unifrac.weighted comand implements the weighted UniFrac algorithm. The unifrac.unweighted command implements the unweighted version of the command. Both of these methods are available through the UniFrac website. The UniFrac methods are generic tests that describes whether two or more communities have the same structure. The significance of the test statistic can only indicate the probability that the communities have the same structure by chance. The value does not indicate a level of similarity. The files that we discuss in this tutorial can be obtained by downloading the AbRecovery.zip file and decompressing it.


unifrac.weighted()

The unifrac.weighted() command can only be run after a successful read.tree() command execution. To start this tutorial, enter the following command:

mothur > read.tree(tree=abrecovery.paup.nj, group=abrecovery.groups)

By default, the unifrac.weighted() command will carry out the weighted UniFrac test on each tree in the tree file. Since this version of the algorithm can only compare two treatments at a time there is no global test as there is for the parsimony and unweighted UniFrac algorithms. Therefore, this test will determine whether any of the groups within the group file have a significantly different structure than the other groups. Execute the command with default settings:

mothur > unifrac.weighted()

This will produce:

Tree#	Groups	WScore	WSig
1	A-B	0.3660	<0.001
1	A-C	0.3817	<0.001
1	B-C	0.4184	<0.001

This means that the tree had scores between 0.3630 and 0.4162 for each of the pairwise comparisons and that the significance of the score (i.e. p-value) was less than 1 in 1,000. If any of the p-values had been greater than 0.01667, then that comparison would not be considered significant. Instead, here all three comparisons are significant. These data are also in the abrecovery.paup.nj.wsummary file. Looking at the file abrecovery.paup.nj1.weighted you will see a table with the score of your tree with the different possible pairwise comparisons and the distribution information for the 1,000 randomly labelled trees that were constructed:

A-BScore	A-BRandFreq	A-BRandCumul	A-CRandFreq	A-CRandCumul	B-CRandFreq	B-CRandCumul
0.067772	0.000		1.000		0.001		1.000		0.000		1.000
0.071141	0.000		1.000		0.001		0.999		0.000		1.000
0.071259	0.000		1.000		0.001		0.998		0.000		1.000
...
0.128733	0.000		0.597		0.001		0.404		0.000		0.618
0.128744	0.000		0.597		0.001		0.403		0.000		0.618
0.128749	0.001		0.597		0.000		0.402		0.000		0.618
0.128816	0.000		0.596		0.001		0.402		0.000		0.618
0.128875	0.001		0.596		0.000		0.401		0.000		0.618
...
0.240668	0.000		0.002		0.000		0.000		0.001		0.001
0.244803	0.001		0.002		0.000		0.000		0.000		0.000
0.265212	0.001		0.001		0.000		0.000		0.000		0.000

As the output to the screen indicated, this file tells you that if your comparison between A and B had a score 0.3630 then there were no randomly-labelled trees with a score greater than or equal to 0.3630; therefore, your p-value would be less than 0.001 (one divided by the number of randomizations). Alternatively, if your comparison between A and B had a score of 0.128749, this table would tell you that 1 of the 1,000 random trees had a score of 0.128749 and that 597 of the 1,000 random trees (i.e. P=0.597) had a score of 0.128749 or larger.

If instead of loading abrecovery.paup.nj you had instead loaded abrecovery.paup.bnj and run unifrac.weighted():

mothur > read.tree(tree=abrecovery.paup.bnj, group=abrecovery.groups)
mothur > unifrac.weighted()

This will generate the abrecovery.paup.nj.wsummary file, but it will also generate 1,000 *.weighted files (one for each tree you supplied) with contents similar to that observed in abrecovery.paup.nj1.weighted.


The groups option

Although the weighted.unifrac() command will do all of the pairwise comparisons for you by default, you can specify those comparisons that you are most interested in. For example,

mothur > unifrac.weighted(groups=A-B)
Tree#	Groups	WScore	WSig
1	A-B	0.366	<0.001

or

mothur > unifrac.weighted(groups=A-C)
Tree#	Groups	WScore	WSig
1	A-C	0.3817	<0.001

or

mothur > unifrac.weighted(groups=B-C)
Tree#	Groups	WScore	WSig
1	B-C	0.4184	<0.001

You should notice that these scores and significance levels are the same as what you would get by running the basic unifrac.weighted() command. For the sake of keeping command options parallel between unifrac.weighted(), unifrac.unweighted(), and parsimony(), these two commands will yield the same output as the default unifrac.weighted():

mothur > unifrac.weighted(groups=A-B-C)
Tree#	Groups	WScore	WSig
1	A-B	0.3660	<0.001
1	A-C	0.3817	<0.001
1	B-C	0.4184	<0.001

or

mothur > unifrac.weighted(groups=all)
Tree#	Groups	WScore	WSig
1	A-B	0.3660	<0.001
1	A-C	0.3817	<0.001
1	B-C	0.4184	<0.001


The iters option

If you run the unifrac.weighted() command multiple times, you will notice that while the score for your user tree doesn't change, it's significance may change some. This is because the testing procedure is based on a randomization process that becomes more accurate as you increase the number of randomizations. By default, unifrac.weighted() will do 1,000 randomizations. You can change the number of iterations with the iters option as follows:

mothur > unifrac.weighted(iters=10000)

The random option

The random parameter allows you to shut off the comparison to random trees. The default is true, meaning compare your trees with randomly generated trees.

mothur > unifrac.weighted(random=false)

The distance option

The distance parameter allows you to create a distance file from the results generated. The default is false.

mothur > unifrac.weighted(distance=true)

The name option

The name parameter allows you to enter a namesfile with your tree.

mothur > read.tree(tree=abrecovery.paup.bnj, group=abrecovery.groups, abrecovery.names)
mothur > unifrac.weighted()


Missing names in tree or group file

If you are missing a name from your tree or groups file mothur will warn you and return to the mothur prompt. Be sure that you don't have spaces in your sequence or group names.


Differences in implementation

The UniFrac website provides extensive resources for generating tree and PCA plots to compare sequence collections based on the UniFrac statistic. We have shown Schloss (2008) that this is a problematic approach. The statistic is not necessarily proportional to the true similarity of the communities and may become a worse estimate of similarity with increased sampling. On philosophical and scientific grounds, we will not provide any downstream analysis that uses the UniFrac statistic. The statistic is sound for hypothesis testing, but not for comparative analyses.