We will be offering mothur and R workshops throughout 2019. Learn more.

Get.oturep

From mothur
Revision as of 19:02, 27 January 2010 by Westcott (Talk | contribs)

Jump to: navigation, search

While the bin.seqs command reports the OTU number for all sequences, the get.oturep command generates a fasta-formatted sequence file containing only a representative sequence for each OTU. A file is generated for each OTU definition. For this tutorial, download and decompress AmazonData.zip.



Default settings

To run the get.oturep command a distance matrix must be stored in memory and you must provide a fasta-formatted file and list file whose sequence names are complementary to the names in the distance matrix. For example,

mothur > read.dist(phylip=98_sq_phylip_amazon.dist)
mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list)

Will generate 12 fasta-formatted files (one per OTU definition in the list file) that each contain the same number of sequences as there are OTUs for that OTU definition. If there are three or more sequences in an OTU, the representative sequence is that sequence which is the minimum distance to the other sequences in the OTU. For example, the file 98_sq_phylip_amazon.fn.0.10.fasta contains the following output:

>U68589|1|1
TAATACGTAGGGTGCAAGCGTTGCCCGGGTTTATTGGGCGTAAAGGGCGCGTAGGCG...
>U68590|2|1
TAATACGGGGGGAGCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGCGTATGCG...
>U68591|3|1
CGTAGGGTCCAAGCGTTATCCGGAATTACTGGGCGTAAAGAGTTGCGTAGGTGGCAT...
>U68687|4|2
TAATACAGAGGTCCCAAGCGTTGTTCGGATTCACTGGGCGTAAAGGGTGCGTAGGTG...
>U68592|4|1
TAATACAGAGGTCCCGAGCGTTGTTCGGATTCACTGGGCGTAAAGGGTGCGTAGGTG...
>U68593|5|1
TAATACGTAAGGACCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGCGTANGCG...
>U68594|6|3
TAATCCCAAGGGTGCAANCGTTACTCGGAATTACTGGGCGTAAAGCGTGCGTAGGTG...
>U68636|7|2
TAATACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGCGCGTAGGCG...
>U68614|8|3
TAATACGTANGTGGCAAGCGTTGTCCGGAGTTACTGGGTGTAAAGGGCGTGTAGGCG...
...


The representative sequences are named identically to the names given in the input fasta file with the addition of the OTU number and number of sequences represented separated by a vertical dash (|) as follows:

>name|OTU#|sequences represented
sequence....

Options

name

A names file indicating sequence names that are identical to a references sequence, such as that used for the read.dist command, may be inputted into the get.oturep() command so that the fasta and list files are complementary. The following commands illustrate this:

mothur > read.dist(column=96_lt_column_amazon.dist, name=amazon.names)
mothur > get.oturep(fasta=amazon.unique.fasta, list=98_sq_phylip_amazon.fn.list)


label

There may only be a couple of lines in your OTU data that you are interested in running through get.oturep(). There are two options. You could: (i) manually delete the lines you aren't interested in from your list file; (ii) or use the label option. If you only want to read in the data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10)

sorted

You may want to sort the output you get from the get.oturep command. The sorted option allows you to indicate how you want the output sorted. You can sort by sequence name, bin number, bin size or group. The default is no sorting, but your options are name, number, size, or group.

mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, sorted=name)

large

The large parameter allows you to indicate that your distance matrix is too large to fit in RAM. The default value is false.

mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, large=true)