We will be offering an R workshop December 18-20, 2019. Learn more.
While the bin.seqs command reports the OTU number for all sequences, the get.oturep command generates a fasta-formatted sequence file containing only a representative sequence for each OTU. A .rep.fasta and .rep.names file is generated for each OTU definition. For this tutorial, download and decompress AmazonData.zip.
To run the get.oturep command a distance matrix must be stored in memory or you must provide the distance files, either a phylip-formatted distance matrix or a column-formatted distance matrix. You also must provide a fasta-formatted file and list file whose sequence names are complementary to the names in the distance matrix. For example,
mothur > read.dist(phylip=98_sq_phylip_amazon.dist) mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list)
mothur > get.oturep(phylip=98_sq_phylip_amazon.dist, fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list)
mothur > get.oturep(column=98_lt_column_amazon.dist, name=amazon.names, fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list)
Will generate 12 fasta-formatted files (one per OTU definition in the list file) that each contain the same number of sequences as there are OTUs for that OTU definition. If there are three or more sequences in an OTU, the representative sequence is that sequence which is the minimum distance to the other sequences in the OTU. For example, the file 98_sq_phylip_amazon.fn.0.10.fasta contains the following output:
>U68589|1|1 TAATACGTAGGGTGCAAGCGTTGCCCGGGTTTATTGGGCGTAAAGGGCGCGTAGGCG... >U68590|2|1 TAATACGGGGGGAGCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGCGTATGCG... >U68591|3|1 CGTAGGGTCCAAGCGTTATCCGGAATTACTGGGCGTAAAGAGTTGCGTAGGTGGCAT... >U68687|4|2 TAATACAGAGGTCCCAAGCGTTGTTCGGATTCACTGGGCGTAAAGGGTGCGTAGGTG... >U68592|4|1 TAATACAGAGGTCCCGAGCGTTGTTCGGATTCACTGGGCGTAAAGGGTGCGTAGGTG... >U68593|5|1 TAATACGTAAGGACCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGCGTANGCG... >U68594|6|3 TAATCCCAAGGGTGCAANCGTTACTCGGAATTACTGGGCGTAAAGCGTGCGTAGGTG... >U68636|7|2 TAATACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGCGCGTAGGCG... >U68614|8|3 TAATACGTANGTGGCAAGCGTTGTCCGGAGTTACTGGGTGTAAAGGGCGTGTAGGCG... ...
The representative sequences are named identically to the names given in the input fasta file with the addition of the OTU number and number of sequences represented separated by a vertical dash (|) as follows:
>name|OTU#|sequences represented sequence....
A names file indicating sequence names that are identical to a references sequence, such as that used for the read.dist command, may be inputted into the get.oturep() command so that the fasta and list files are complementary. The following commands illustrate this:
mothur > read.dist(column=96_lt_column_amazon.dist, name=amazon.names) mothur > get.oturep(fasta=amazon.unique.fasta, list=98_sq_phylip_amazon.fn.list)
There may only be a couple of lines in your OTU data that you are interested in running through get.oturep(). There are two options. You could: (i) manually delete the lines you aren't interested in from your list file; (ii) or use the label option. If you only want to read in the data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:
mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10)
You may want to sort the output you get from the get.oturep command. The sorted option allows you to indicate how you want the output sorted. You can sort by sequence name, bin number, bin size or group. The default is no sorting, but your options are name, number, size, or group.
mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, sorted=name)
The large parameter allows you to indicate that your distance matrix is too large to fit in RAM. The default value is false.
mothur > get.oturep(fasta=amazon.fasta, list=98_sq_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, large=true)