We will be offering mothur and R workshops throughout 2019. Learn more.

Talk:Cluster

From mothur
Jump to: navigation, search

Suggestion

Another suggestion from a user: I would be interested in knowing the number of OTU/clusters produced at a particular cutoff. Maybe it is an idea to report these numbers in the output / .sabund file?


Actually, if you do this with the calc=sobs calculator in summary.single command --Pat Schloss 20:31, 7 August 2009 (EDT)


problem using average neighbor with cutoffs

If I use the average neighbor method, I'm having trouble retrieving data for the specified cutoffs. For example, I'd like to use the following sequence of commands to make and to use a distance matrix <0.20: dist.seqs(fasta=xxxx, cutoff=0.20), read.dist(column=xxxxx, name=xxxxx), cluster(method=average). The following sequence of commands works: dist.seqs(fasta=xxxx), read.dist(column=xxxxx, name=xxxxx), cluster(method=average), but I cannot limit the amount of data that is created (I'd like to use a cutoff at some point of 0.20, preferably in the dist.seqs command). If I add this cutoff to the dist.seqs command only, the cluster output only retrieves data through the 0.02 level. The same occurs if I add the cutoff=0.20 option to either the read.dist or cluster methods as well. HOWEVER, if I simply change the cluster method to furthest neighbor (and keep the 0.20 cutoff in dist.seqs, the problem disappears. HELP PLEASE! We have large files, and running these commands without limits really challenges our resources. THANKS!


I'm not surprised that you're losing distances, but I am surprised that you're losing so many - I would expect you to get to around 0.10 when you set the cutoff to 0.20. This is happening because with average neighbor the algorithm averages distances between sequences getting merged. Because some distances are not "seen" when we use the cutoff method, mothur will adjust the cutoff to what is seeable. Thus, the cutoff is changing. This doesn't happen with furthest or nearest because there is no averaging involved. Are you filtering your sequences so they overlap over the same region prior to calculating the distances? We typically see this big drop because people don't filter and get wonky distances. --Pat Schloss 07:39, 14 December 2010 (EST)