We will be offering mothur and R workshops throughout 2019. Learn more.

Read.otu

From mothur
Revision as of 10:46, 24 November 2009 by Westcott (Talk | contribs)

Jump to: navigation, search

There are four types of OTU data for a single sample that can be read into mothur using the read.otu command:

  • Rank-abundance data (*.rabund file): Contains the number of sequences within each OTU
  • Species-abundance data (*.sabund file): Contains the number of OTUs with a certain number of individuals
  • List data (*.list): Contains the names of the sequences within each OTU separated by commas
  • Shared rank-abundance data (*.shared file): Contains the number of sequences within each OTU for each group

The easiest way to input these data into mothur is to have mothur generate them. If you have just run a cluster() command, the data is stored in memory and does not need to be inputted. However, if you are defining OTUs using something other than distances between sequences, you can generate your own OTUs and data file to input into mothur. For examples of what these different files look like please download and decompress AmazonData.zip.


Inputting rank-abundance data

The data within the file 98_sq_phylip_amazon.fn.rabund represents the rank ordering of the number of individuals/sequences within each OTU. Each line of the file represents a different OTU definition. When mothur generates this file the first column of the file is a label, which is the distance definition of the OTU. There is nothing sacred about these labels except that they do not include spaces. The second column contains the number of OTUs in that row. Subsequent columns contain the number of individuals/sequences in each of the OTUs. The number of OTUs must be the number in the second column or bad things will happen.

To read this file into memory type:

mothur > read.otu(rabund=98_sq_phylip_amazon.fn.rabund)


Inputting species-abundance data

The data within the file 98_sq_phylip_amazon.fn.sabund represents the number of OTUs with a specified number of individuals/sequences. Each line of the file represents a different OTU definition. When mothur generates this file the first column of the file is a label, which is the distance definition of the OTU. There is nothing sacred about these labels except that they do not include spaces. The second column contains the number of sequences in the dominant OTU(s). Subsequent columns contain the number of OTUs with 1, 2, 3, etc individuals/sequences. The number of columns must be the number in the second column or bad things will happen.

To read this file into memory type:

mothur > read.otu(sabund=98_sq_phylip_amazon.fn.sabund)


Inputting list data for a single sample

The data within the file 98_sq_phylip_amazon.fn.list represents the names of the sequences in each OTU. Each line of the file represents a different OTU definition. When mothur generates this file the first column of the file is a label, which is the distance definition of the OTU. There is nothing sacred about these labels except that they do not include spaces. The second column contains the number of OTUs. Subsequent columns contain the names of the sequences in each OTU. Sequences in the same OTU are separated by commas and OTUs are separated by a space or tab. The number of columns must be the number in the second column or bad things will happen.

To read this file into memory type:

mothur > read.otu(list=98_sq_phylip_amazon.fn.list)


Inputting list data for multiple samples

The Amazonian data set actually represents sequences from two samples - one from soil collected at a pasture site and a rainforest site. Looking at the file amazon.groups you can see which sequences belong to the pasture and rainforest sites. There is no limit to the number of groups you can have. To read in the list file for subsequent multi-sample analyses and parse the list file into separate list files for each group you must include the group option when running read.otu with the list option:

mothur > read.otu(list=98_sq_phylip_amazon.fn.list, group=amazon.groups)

This command will generate two types of files: a shared file (98_sq_phylip_amazon.fn.shared) and rabund files for each group (98_sq_phylip_amazon.fn.pasture.rabund and 98_sq_phylip_amazon.fn.rainforest.rabund). These new rabund files can be read in using the read.otu(rabund=...) command and processed using any of the single sample analyses. This parsing was not previously possible; one had to generate separate distance matrices for each group and run DOTUR on each matrix. Because clustering can change when different sequences are included or excluded it was possible that sequences would wind up in OTUs different than those they were in when all of the data was considered together.


Inputting shared data for multiple samples

After running the read.otu command with both the list and group options you can read back in the shared file at a later time. To do this, you need to execute the following command:

mothur > read.otu(shared=98_sq_phylip_amazon.fn.shared)

You can then proceed with the other multi-sample analyses


Options

label

There may only be a couple of lines in your OTU data that you are interested in summarizing. There are two options. You could: (i) manually delete the lines you aren't interested in from you rabund, sabund, list, or shared file; (ii) or use the label option. If you only want to read in the data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > read.otu(list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10)

Then when you run:

mothur > summary.single(calc=sobs-chao)

You only analyze the lines that you are interested in. To get all of the lines, you can give the line and label option the value "all".

mothur > read.otu(list=98_sq_phylip_amazon.fn.list, label=all)

groups

You can use the groups parameter to specify which groups you want included in your analysis:

mothur > read.otu(list=98_sq_phylip_amazon.fn.list, group=amazon.groups, groups=forest-pasture)