We will be offering an R workshop December 18-20, 2019. Learn more.


From mothur
Revision as of 16:04, 22 May 2009 by Westcott (Talk | contribs) (The jumble option)

Jump to: navigation, search

The traditional way that ecologists use rarefaction is not to randomize the sampling order within a sample, rather between samples. For instance, if we wanted to know the number of OTUs in the human colon, we might sample from various sites within the colon, and sequence a bunch of 16S rRNA genes. By determining the number of OTUs in each sample and comparing the composition of those samples it is possible to determine how well you have sampled the biodiversity within the individual. mothur has the ability to generate data for inter-sample rarefaction curves for the number of observed speciees. For this tutorial you should download and decompress Patient70Data.zip


To execute the rarefaction.shared() command you first need to have run the read.otu() command with the list and group options. For example:

mothur > read.otu(list=patient70.fn.list, group=patient70.sites.groups)

By default, the rarefaction.shared() command will perform 1,000 randomizations of the order in which the samples are considered. So if you run rarefaction.shared() multiple times, you will get slightly different results. To get the default behavior enter:

mothur > rarefaction.shared()

This will result in output to the screen looking like:

unique	1
0.00	2
0.01	3
0.02	4
0.03	5
0.04	6
0.05	7
0.06	8
0.07	9
0.08	10
0.09	11
0.10	12

The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. Execution of rarefaction.shared() will generate the file patient70.fn.shared.rarefaction, which looks like:

samples	unique		lci		hci		0.00		lci		hci		...	
1	443.7500	164.7345	722.7655	182.8480	86.3489		279.3471	...
2	850.0090	490.6902	1209.3278	308.3940	187.8304	428.9576	...
3	1248.1000	855.0973	1641.1027	413.4350	291.1591	535.7109	...
4	1634.4600	1251.6258	2017.2942	502.5470	392.0305	613.0635	...
5	2002.1400	1655.6566	2348.6234	582.0000	485.6954	678.3046	...
6	2375.1600	2107.6512	2642.6688	650.0620	578.2567	721.8673	...
7	2742.0000	2742.0000	2742.0000	713.0000	713.0000	713.0000	...

The actual file has many more columns, but this should give you the idea. The first column is the number of samples that have been considered. Since their were 7 group names in the group file, rarefaction.shared() rarefied the data over the 7 groups. The rest of the columns are in triplets. The first column of the triplet is the average number of OTUs observed after sampling 1, 2, 3, etc. groups. The second and third columns of each triplet are the bounds on the 95% confidence interval of average. If you look closely, you will find that the average for 1 sample is the average number of OTUs observed across all of the samples. Also, if you look at the data in the row for 7 samples you will see that the bounds on the 95% confidence interval are the same as the average. Finally, the richness observed for sample 7 is the total number of OTUs you would observe if you processed patient70.fn.list through summary.single().

The line & label options

There may only be a couple of lines in your OTU data that you are interested in generating the rarefaction curve for. There are three options. You could: (i) manually delete the lines you aren't interested in from your list file; (ii) use the label option; and (iii) use the line option. To use the label option with the rarefaction.shared() command you need to know the labels you are interested in. If you want the rarefaction curve data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > rarefaction.shared(label=unique-0.03)


mothur > rarefaction.shared(line=1-5)

In the file patient70.fn.shared.rarefaction you would see something like:

numsampled	unique		lci		hci		0.03		lci		hci
1		437.9010	156.7862	719.0158	58.6599 	32.9568 	84.3630
2		844.5540	484.9140	1204.1940	76.2709 	52.1988 	100.3430
3		1234.6800	850.1886	1619.1714	86.6658 	64.5309 	108.8007
4		1617.0100	1239.6273	1994.3927	94.2071 	74.2022 	114.2120
5		2003.0800	1661.2946	2344.8654	100.1300	82.7736 	117.4864
6		2376.2700	2105.6267	2646.9133	105.6120	92.6939 	118.5301
7		2742.0000	2742.0000	2742.0000	111.0000	111.0000	111.0000

The iters option

To improve the accuracy of the calculations you can change the number of randomizations that are performed using the iters option; the default value is 1,000. Running 10,000 randomization should take 10-times as long as the default:

mothur > rarefaction.shared(iters=10000)

The jumble option

Obviously, the goal of rarefaction is to randomize across the samples; however, if you just want a collector's curve across the samples you can use the jumble option:

mothur > rarefaction.shared(jumble=0)

The groups option

If you had started this tutorial with the following comamnds:

mothur > read.otu(list=patient70.fn.list, group=patient70.sites.groups)
mothur > get.group()

You would have seen that there were 7 groups here: 70A-70F and 70S. The sequences from 70S were collected from Patient 70's stool sample those from samples 70A-70F were from their mucosa. These 7 groups would yield 21 lies if you ran the rarefaction.shared() command; however, if you were only interested in the comparisons between each mucosa site you could use the groups option:

mothur > rarefaction.shared(groups=70A-70B-70C-70D-70E-70F)

Alternatively, if you want all of the pairwise comparisons you can either not include the group option or set it equal to "all".

mothur > rarefaction.shared(calc=sharedsobs, groups=all)