We will be offering an R workshop December 18-20, 2019. Learn more.


From mothur
Revision as of 16:35, 15 March 2016 by Mbakker (Talk | contribs) (rearranged sections into more logical order)

Jump to: navigation, search

The sub.sample command can be used as a way to normalize your data, or create a smaller set from your original set. It takes as an input the following file types: fasta, list, shared, rabund and sabund to generate a new file that contains a sampling of your original file.

fasta option

To use the fasta option, follow this example:

mothur > sub.sample(fasta=esophagus.unique.fasta)

This generates the file esophagus.subsample.fasta.

name option

If you have a names file associated with the fasta file you may use the name option:

mothur > sub.sample(fasta=esophagus.unique.fasta, name=esophagus.names)

This generates the file esophagus.subsample.names.

list option

To use the list option, follow this example:

mothur > sub.sample(list=esophagus.fn.list)

This generates the file esophagus.fn.subsample.list.


The taxonomy parameter allows you to provide a taxonomy file associated with your fasta or list parameter.

mothur > sub.sample(fasta=esophagus.unique.fasta, name=esophagus.names, taxonomy=esophagus.rdp.taxonomy)


The count file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence. It can also contain group information.

mothur > sub.sample(fasta=esophagus.unique.fasta, count=esophagus.count_table)

or if you clustered using a count file be sure to include it:

mothur > sub.sample(list=esophagus.fn.unique_list, count=esophagus.count_table)

group option

The group option may be used with a fasta file or list file to generate a new group file to match your sampled list or fasta file.

mothur > sub.sample(list=esophagus.fn.list, group=esophagus.groups)


mothur > sub.sample(fasta=esophagus.fasta, group=esophagus.groups)


mothur > sub.sample(fasta=esophagus.unique.fasta, name=esophagus.names, group=esophagus.groups)

shared option

To use the shared option, follow this example:

mothur > sub.sample(shared=esophagus.unique.fn.shared)

This generates the file esophagus.unique.fn.subsample.shared.

rabund option

To use the rabund option, follow this example:

mothur > sub.sample(rabund=esophagus.unique.fn.rabund)

This generates the file esophagus.unique.fn.subsample.rabund.

sabund option

To use the sabund option, follow this example:

mothur > sub.sample(sabund=esophagus.unique.fn.sabund)

This generates the file esophagus.unique.fn.subsample.sabund.


The groups option may only be used if you have entered a groupfile for the list or fasta option, or if you are using a shared file. The groups parameter allows you to specify which of the groups in your groupfile you would like included. The group names are separated by dashes.

To use the groups option, follow this example:

mothur > sub.sample(shared=esophagus.unique.fn.shared, groups=B-C)

This generates the file esophagus.unique.fn.subsample.shared, containing only a sampling from the groups B and C.


The label option can be used with all file types except fasta, and allows you to select what distance levels you would like, and are also separated by dashes.

To use the label option, follow this example:

mothur > sub.sample(shared=esophagus.unique.fn.shared, label=unique=0.03)

This generates the file esophagus.unique.fn.subsample.shared, containing a sampling at distances unique and 0.03.


The size parameter allows you to select the size of your sampling. By default for a shared file, size is set to the size of your smallest sample. For all other files types, size is defaulted to 10% of the number of sequences in your original file. For list and fasta files size may not exceed your number of sequences as this would cause duplicate sequence names, but for the shared, rabund and sabund, size may exceed the original number of sequences represented.

To use the size option, follow this example:

mothur > sub.sample(shared=esophagus.unique.fn.shared, size=200)


The persample parameter allows you indicate you want to select subsample of the same size from each of your groups, default=false. It is only used with the list and fasta files if a groupfile is given. persample=false will select a random set of sequences of the size you select, but the number of sequences from each group may differ.


  • 1.22.0 Improved speed of sub.sample command on list and fasta files.
  • 1.22.0 Now outputs a unique fasta file and names file.
  • 1.28.0 Added count option.
  • 1.30.0 Added taxonomy option.