We will be offering an R workshop December 18-20, 2019. Learn more.


From mothur
Revision as of 14:35, 15 January 2009 by Westcott (Talk | contribs)

Jump to: navigation, search

Validate output by making calculations by hand

Example Calculations


Estimating the fraction of shared OTUs between two communities. Incidence-based measures of community similarity such as the classic Sørenson (Sclas) similarity indices calculate the ratio of shared OTUs to the total number of OTUs in individual communities:

<math>S_{class} = \frac{2S_{12}}{S_1 + S_2}</math>,


<math> S_1, S_2</math> = number of OTUs observed or estimated in A and B.

<math> S_{12} </math> = number of OTUs shared between A and B.

The observed number of OTUs at distance 0.03 in A and B was 89 and 81, respectively. Shared between the two libraries were 60 OTUs. Therefore the value of the equations for <math>S_{class} </math> was 0.705882 as seen below.

File Samples on the Eckburg 70.stool_compare Dataset

  • .shared

This file contains the frequency of sequences from each group found in each OTU. Each row consists of the distance being considered, group name, number of OTUS, and the abundance information separated by tabs. The abundance information is as follows. Each subsequent number represents a different OTU so that the number indicates the number of sequences in that group that clustered within that OTU. Note that OTU frequencies can only be compared within a distance definition. Below is a link to the file used in the calculations.


  • .sharedSorclass

The first line contains the labels of all the columns. First sampled which shows the frequency of the <math>S_{class} </math> calculations. The frequency was set to 500, so after each 500 selected the <math>S_{class} </math> is calculated at each of the distances, with a calculation done after all are sampled. The following labels in the first line are the distances at which the calculations were made and the names of the groups compared. Each additional line starts with the number of sequences sampled followed by the <math>S_{class} </math> calculation at the column's distance. For instance, at distance 0.01, after 4392 samples <math>S_{class} </math> was 0.371094.

sampled   uniquetissuestool    0.00tissuestool	  0.01tissuestool    0.02tissuestool    0.03tissuestool		
1		  0		       0		  0		  0		     0		
500		0.0454545	   0.201613	      0.357616		0.545455	  0.609756		
1000		0.0643432	   0.194226	      0.411765		0.609929	  0.576923		
1500		0.0673527	   0.2079	      0.448133		0.699386	  0.651163		
2000		0.0609137	   0.228873	      0.492424		0.729282	  0.666667		
2500		0.0544057	   0.241379	      0.510345		0.731183	  0.675497		
3000		0.0479281	   0.260116	      0.51634		0.727273	  0.666667		
3500		0.0495222	   0.262162	      0.518519		0.724638	  0.679012		
4000		0.0462784	   0.269182	      0.542274		0.720379	  0.678788		
4392		0.0463128	   0.279126	      0.541311		0.71028		  0.705882