We will be offering an R workshop December 18-20, 2019. Learn more.


From mothur
Revision as of 20:13, 27 January 2010 by Westcott (Talk | contribs)

Jump to: navigation, search

While the bin.seqs command reports the OTU number for all sequences, the get.oturep command generates a fasta-formatted sequence file containing only a representative sequence for each OTU. A .rep.fasta and .rep.names file is generated for each OTU definition. For this tutorial, download and decompress AmazonData.zip.

Default settings

To run the get.oturep command you must provide the distance files, either a phylip-formatted distance matrix or a column-formatted distance matrix. You also must provide a fasta-formatted file and list file whose sequence names are complementary to the names in the distance matrix. For example,

mothur > get.oturep(phylip=98_sq_phylip_amazon.dist, fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list)


mothur > get.oturep(column=98_lt_column_amazon.dist, name=amazon.names, 
                    fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list)

Will generate 12 fasta-formatted files (one per OTU definition in the list file) that each contain the same number of sequences as there are OTUs for that OTU definition. If there are three or more sequences in an OTU, the representative sequence is that sequence which is the minimum distance to the other sequences in the OTU. For example, the file 98_sq_phylip_amazon.fn.0.10.fasta contains the following output:


The .rep.names file will look like:

U68589	U68589
U68590	U68590
U68591	U68591
U68687	U68687,U68592
U68593	U68593
U68594	U68594
U68636	U68636,U68631,U68666,U68595
U68614	U68617,U68614,U68596
U68605	U68605,U68597
U68673	U68645,U68678,U68619,U68673,U68667,U68641,U68598
U68599	U68599
U68600	U68600

The representative sequences are named identically to the names given in the input fasta file with the addition of the OTU number and number of sequences represented separated by a vertical dash (|) as follows:

>name|OTU#|sequences represented



A names file indicating sequence names that are identical to a references sequence, such as that used for the read.dist command, may be inputted into the get.oturep() command so that the fasta and list files are complementary. The following commands illustrate this:

mothur > read.dist(column=96_lt_column_amazon.dist, name=amazon.names)
mothur > get.oturep(fasta=amazon.unique.fasta, list=98_sq_phylip_amazon.fn.list)


There may only be a couple of lines in your OTU data that you are interested in running through get.oturep(). There are two options. You could: (i) manually delete the lines you aren't interested in from your list file; (ii) or use the label option. If you only want to read in the data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10)


You may want to sort the output you get from the get.oturep command. The sorted option allows you to indicate how you want the output sorted. You can sort by sequence name, bin number, bin size or group. The default is no sorting, but your options are name, number, size, or group.

mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, sorted=name)


The large parameter allows you to indicate that your distance matrix is too large to fit in RAM. The default value is false.

mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, large=true)