We will be offering an R workshop December 18-20, 2019. Learn more.


From mothur
Revision as of 17:25, 29 March 2018 by Westcott (Talk | contribs) (processors)

Jump to: navigation, search

The summary.shared command will produce a summary file that has the calculator value for each line in the OTU data and for all possible comparisons between the different groups in the group file. This can be useful if you aren't interested in generating collector's or rarefaction curves for your multi-sample data analysis. It would be worth your while, however, to look at the collector's curves for the calculators you are interested in to determine how sensitive the values are to sampling. If the values are not sensitive to sampling, then you can trust the values. Otherwise, you need to keep sampling. For this tutorial you should download and decompress Patient70Data.zip

Default settings

First you will need to make a shared file from your list and group files.

mothur > make.shared(list=patient70.fn.list, group=patient70.tissue_stool.groups)

The summary data for multi-sample calculators are generated by default with the following command:

mothur > summary.shared(shared=patient70.fn.shared)

This will result in output to the screen looking like:

unique	1
0.00	2
0.01	3
0.02	4
0.03	5
0.04	6
0.05	7
0.06	8
0.07	9
0.08	10
0.09	11
0.10	12

The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. In sons, the summary data was provided in a file ending in "sons.ltt" and was only generated after the collector's curves were generated. Now, in mothur, all of this data is contained within a single "shared.summary" file. In this case data was written to the file patient70.fn.shared.summary, which looks like:

label	comparison	sharedsobs	sharedchao	sharedace	JAbund		SorAbund	Jclass		SorClass
unique	stool	tissue	73.000000	161.449997	108.60603	0.150565	0.261723	0.026613	0.051847
0.00	stool	tissue	124.000000	237.481247	254.53860	0.489131	0.656935	0.174402	0.297006
0.01	stool	tissue	94.000000	162.892853	135.36864	0.736210	0.848066	0.367188	0.537143
0.02	stool	tissue	76.000000	110.477272	86.50789	0.892669	0.943291	0.554745	0.713615
0.03	stool	tissue	60.000000	75.916664	72.30236	0.926541	0.961870	0.545455	0.705882

Again, the first column contains the label for the row in the data set you are analyzing. The second and third columns give the group names of the pairwise comparison that is represented by the row. Further columns are labeled to indicate the calculator that was used to generate the data. For instance, here the data in the column labeled SharedSobs contains the number of OTUs that were observed to be shared between groups for each line in the list file. This is actually just a snippet of the file; there are 11 calculators that are calculated by default.



If you don't want to see all of the default calculators, you can tell mothur which ones to use in the summary file:

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs-sharedchao-jest)

This would generate the patient70.fn.shared.summary file:

label	A	B		sharedsobs	sharedchao	Jest
unique	stool	tissue		73.000000	161.449997	0.008066
0.00	stool	tissue		124.000000	237.481247	0.219289
0.01	stool	tissue		94.000000	162.892853	0.546228
0.02	stool	tissue		76.000000	110.477272	0.665435


There may only be a couple of lines in your OTU data that you are interested in summarizing. There are two options. You could: (i) manually delete the lines you aren't interested in from you rabund, sabund, or list file; (ii) or use the label option. To use the label option with either the summary.single() command you need to know the labels you are interested in. If you want the summary data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > summary.shared(shared=patient70.fn.shared, label=unique-0.03-0.05-0.10, calc=sharedsobs-sharedchao)

Opening patient70.fn.shared.summary you would see the output as:

label	A	B		sharedsobs	sharedchao
unique	stool	tissue		73.000000	161.449997
0.03	stool	tissue		60.000000	75.916664
0.05	stool	tissue		51.000000	63.312500
0.10	stool	tissue		28.000000	33.416668


If you had started this tutorial with the following commands:

mothur > make.shared(list=patient70.fn.list, group=patient70.sites.groups)
mothur > get.group(shared=patient70.fn.shared)

You would have seen that there were 7 groups here: 70A-70F and 70S. The sequences from 70S were collected from Patient 70's stool sample those from samples 70A-70F were from their mucosa. These 7 groups would yield 21 pairwise comparisons if you ran the summary.shared command; however, if you were only interested in the comparisons between each mucosa site and the stool sample you could use the group option:

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70A-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70B-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70C-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70D-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70E-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70F-70S)

Alternatively, if you want all of the pairwise comparisons you can either not include the group option or set it equal to "all".

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=all)


The sharedsobs and sharedchao calculators not only do the pairwise estimates, but also estimate the shared richness of all the groups in your file. This calculation is RAM intensive. If your RAM is limited and you have a large number of groups this may result in a crash, so by default the all parameter is set to false. To calculate the shared richness of all your groups, set the all parameter to true.

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs-sharedchao, all=true)


The distance parameter allows you to indicate you would like a distance file created for each calculator at each label, default=f.

mothur > summary.shared(shared=patient70.fn.shared, distance=true)


The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.


The iters parameter allows you to choose the number of times you would like to run the subsample.


The output parameter allows you to indicate if you want the distance file created by summary.shared to be in lower triangle or square format. Options are lt or square, lt is the default.

mothur > summary.shared(shared=patient70.fn.shared, distance=true, output=square)


The processors option allows you to reduce the processing time by using multiple processors. Default processors=Autodetect number of available processors and use all available.

mothur > summary.shared(shared=patient70.fn.shared, processors=2)

Running this command on my laptop doesn't exactly cut the time in half, but it's pretty close. There is no software limit on the number of processors that you can use.