We will be offering a mothur workshop March 30-April 1. Learn more.

Frequently asked questions

From mothur
Revision as of 19:04, 25 November 2015 by Westcott (Talk | contribs) (What commands make trees and how are they different?)

Jump to: navigation, search

If you have any questions, please feel free to email us or to use the discussion tab. If you have an answer, feel free to contribute!

How do I cite mothur?

Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

The original tools:

Schloss, PD & Handelsman, J. (2006) Introducing SONS, a tool for OTU-based comparisons of membership and structure between microbial communities. Applied and Environmental Microbiology. 72:6773-9.

Schloss, PD & Handelsman, J. (2006). Introducing TreeClimber, a test to compare community structures. Applied and Environmental Microbiology. 72: 2379-84.

Schloss, PD & Handelsman, J. (2005). Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Applied and Environmental Microbiology. 71:1501-6.

Schloss, PD, Larget, BR, & Handelsman J. (2004). Integration of microbial ecology and statistics: a test to compare gene libraries. Applied and Environmental Microbiology. 70:5485-92.

Why does the command window open and close quickly when I double click on the mothur?

This is because you have not given mothur the input files and you will get an error message quickly followed by the screen closing. Please see the section on how to run mothur from a command prompt.

Why doesn’t mothur do…?

If you would like to see something added to mothur, please let us know! It may take me a while to get around to implementing the feature, but we are generally reasonable and welcome everyone's expertise that would like to contribute source code.

Do you have an example analysis?

Yes, Schloss_SOP and MiSeq_SOP highlight some of the things you can do with mothur.

Do you do workshops?

Yes! Please see our workshops page for more information.

Why is data missing for some distance levels?

Perhaps the second most commonly asked question is why there isn't a line for distance 0.XX. If you notice the previous example the distances jump from 0.003 to 0.006. Where are 0.004 and 0.005? mothur only outputs data if the clustering has been updated for a distance. So if you don't have data at your favorite distance, that means that nothing changed between the previous distance and the next one. Therefore if you want OTU data for a distance of 0.005 in this case, you would use the data from 0.003. But... many of the commands that use a label option are smart (e.g. read.otu). So, if you say you want data at the 0.03 cutoff, then you can set label=0.03 and mothur will figure out what data to give you.

Aren't the 'unique' and '0.00' distance levels the same?

Perhaps the most commonly asked question is why the cluster command produces data for both the "unique" and "0.00" lines. Aren't they the same? No. The "unique" line represents data for the situation where all of the sequences in an OTU are identical; the "0.00" line represents data for the situation where all of the sequences in an OTU have pairwise distances less than 0.0049. We made the decision that because there is error in everything, we should round these distances as well and not apply a hard cutoff at 0.01, 0.02, etc. But... if you don't buy into this and want a hard cutoff you can set the hard=T option in cluster.

Mothur crashes when I read my distance file

There are two common causes for this, file size and format.

File Size:

  • The cluster command load your distance matrix into RAM, and your distance file is most likely too large to fit in RAM. There are two options to help with this. The first is to use a cutoff. By using a cutoff mothur will only load distances that are below the cutoff. If that is still not enough, there is a command called cluster.split, http://www.mothur.org/wiki/cluster.split which divides the distance matrix, and clusters the smaller pieces separately. You may also be able to reduce the size of the original distance matrix by using the commands outline in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP.

Wrong Format:

  • This error can be caused by trying to read a column formatted distance matrix using the phylip parameter. By default, the dist.seqs command generates a column formatted distance matrix. To make a phylip formatted matrix set the dist.seqs command parameter output to lt.

Why do I have such a large distance matrix

Mothur Blog - Why do I have such a large distance matrix?

Mothur can't find my input files

The most common cause of mothur not finding a file is because you double clicked on the mothur executable to run mothur. This will open a terminal or command prompt window in your home directory. Mothur will then look for the input files in the home directory instead of in the directory where mothur's executable is located. If this is the cause, you either put the input files in your home directory, give complete file names, or open a terminal or command prompt window, cd into the directory where the mothur executable is located and run mothur by using "./mothur" for mac or "mothur" for windows.

I installed the latest version, but I am still running an older version

We often see this issue when you have an older version of mothur installed in your path. You can find out where by opening a terminal window and running:

yourusername$ which mothur

for example:

yourusername$ which mothur

When you find the location of the older version, you can delete it or move it out of your path.

yourusername$ mv path_to_old_version/mothur new_location

for example:
yourusername$ mv /usr/local/bin/mothur /Users/yourusername/desktop/old_version_mothur

Why is my dataset only clustering to "unique"?

We typically see this type of thing when sequences are very divergent from each other (e.g. protein coding sequence) or when the sequences have not been screened to insure that they overlap the same region followed by using filter.seqs to make sure that they start and stop in the same alignment space. Sometimes increasing the cutoff in the dist.seqs and cluster commands can help with this issue.

Why does the cutoff change when I cluster with average neighbor?

This is a product of using the average neighbor algorithm with a sparse distance matrix. When you run cluster, the algorithm looks for pairs of sequences to merge in the rows and columns that are getting merged together. Let's say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it's not possible to merge at a higher level and keep all the data. All of the sequences are still there from multiple phyla. Incidentally, although we always see this, it is a bigger problem for people that include sequences that do not fully overlap.

How do I make a tree?

Mothur has two commands that create trees: clearcut and tree.shared.

The clearcut commands creates a phylogenetic tree that represents how sequences relate. The clearcut program written by Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho. For more information about clearcut please refer to http://bioinformatics.hungry.com/clearcut/

The tree.shared command will generate a newick-formatted tree file that describes the dissimilarity (1-similarity) among multiple groups. Groups are clustered using the UPGMA algorithm using the distance between communities as calculated using any of the calculators describing the similarity in community membership or structure.

How do I know "who" is in an OTU in a shared file?

You can run the get.otulist command on the list file you used to generate the shared file. You want to be sure you are comparing the same distances. ie 98_lt_phylip_amazon.an.0.03.otulist would relate to the 0.03 distance in your shared file. Also, if you subsample your data set and want to compare things, be sure to subsample the list and group file and then create the shared file to make sure you are working with the same sequences.

sub.sample(list=yourListFile, group=yourGroupFile, persample=t)
make.shared(list=yourSubsampledListFile, group=yourSubsampledGroupFile, label=0.03)
get.otulist(list=yourSubsampledListFile, label=0.03)

How do I know "who" is in the OTUs represented in the venn picture?

You can run the get.sharedseqs command. Be sure to pay close attention to the "unique" and "shared" parameters, Get.sharedseqs#unique_.26_shared.

What are mothur's file types?

Mothur uses and creates many file types. Fasta, name, group, design, count, list, rabund, sabund, shared, relabund, oligos, taxonomy, phylip, column, flow, qfile and tree.

How do I select certain sequences or groups of sequences?

Mothur has several "get" and "remove" commands: get.seqs, get.lineage, get.groups, remove.seqs, remove.lineage and remove.groups.

Mothur reports a "bad_alloc" error in the shhh.flows command

This error indicates your computer is running out of memory. The shhh.flows command is very memory intensive. This error is most commonly caused by trying to process a dataset too large, using multiple processors, or failing to run trim.flows before shhh.flows. If you are running our 32bit version, your memory usage is limited to 4G. If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue. If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Running trim.flows with an oligos file, and then shhh.flows with the file option may also resolve the issue. If for some reason you are unable to run shhh.flows with your data, a good alternative is to use the trim.seqs command using a 50-bp sliding window and to trim the sequence when the average quality score over that window drops below 35. Our results suggest that the sequencing error rates by this method are very good, but not quite as good as by shhh.flows and that the resulting sequences tend to be a bit shorter.

File Mismatches - "[ERROR]: yourSequence is in fileA but not in fileB, please correct."

The most common reason this occurs is because you forgot to include a name or count file on a command, or accidentally included the wrong one due to a typo. Mothur has a "current" option, which allows you to set file parameters to "current". For example, if fasta=current mothur will use the last fasta file given or created. The current option was designed to help avoid typo mistakes due to mothur's long filenames. Another reason this might occur is a process failing when you are using multiple processors. If a process dies, a file can be incomplete which would cause a mismatch error downstream.