chimera.slayer

The chimera.slayer command reads a fasta file and reference file and outputs potentially chimeric sequences. The developers of the original algorithm suggest using a special template reference set (i.e. gold). We provide a silva-based alignment of this dataset with our silva reference files. This command was modeled after the chimeraSlayer written by the Broad Institute. Additional documentation can be found at the Broad Institute’s website.

Default Settings

The fasta and reference parameters are required, unless you are using reference=self. You may enter multiple fasta files by separating them by dashes. Example: fasta=ex.align-abrecovery.align.

mothur > chimera.slayer(fasta=stool.trim.unique.good.filter.unique.precluster.fasta, reference=silva.gold.filter.fasta)

The output to the screen looks like:

Checking sequences from /Users/SarahsWork/Desktop/release/stool.trim.unique.good.filter.unique.precluster.fasta ...
Reading sequences from /Users/SarahsWork/Desktop/release/silva.gold.filter.fasta...Done.

Only reporting sequence supported by 90% of bootstrapped results.
M23Fcsw_269061 yes
M24Fcsw_22155  yes
M24Fcsw_21866  yes
...

Output in stool.trim.unique.good.filter.unique.precluster.slayer.chimeras:

Name   LeftParent  RightParent DivQLAQRB   PerIDQLAQRB BootStrapA  DivQLBQRA   PerIDQLBQRA BootStrapB  Flag    LeftWindow  RightWindow
F21Fcsw_12128  no
F11Fcsw_6529   no
F11Fcsw_112161 S000530395  7000004128191037    0.946746    89.8876 0   1.04142 98.8764 81  no  0-185   187-369 
F11Fcsw_56988  no
F11Fcsw_63768  no
F21Fcsw_11639  no
M11Fcsw_34015  no
...

Note: DivQLAQRB = divergence from query sequence to leftside of parent A and rightside parent B PerIDQLAQRB = similarity of query to leftside of parent A and rightside parent B DivQLBQRA = divergence from query sequence to rightside of parent A and leftside parent B PerIDQLBQRA = similarity of query to rightside of parent A and leftside parent B

You may also set reference=self, in this case the more abundant sequences will be used as potential parents.

mothur > chimera.slayer(fasta=stool.trim.unique.good.filter.unique.precluster.fasta, name=stool.trim.unique.good.filter.unique.precluster.names, reference=self)

or with a count file:

mothur > chimera.slayer(fasta=stool.trim.unique.good.filter.unique.precluster.fasta, count=stool.trim.unique.good.filter.unique.precluster.count_table, reference=self)

Options

count

If you are using reference=self and provide a count file, mothur will use the more abundant sequences from the same sample to check the query sequence.

mothur > chimera.slayer(fasta=stool.trim.unique.good.filter.unique.precluster.fasta, count=stool.trim.unique.good.filter.unique.precluster.count_table, reference=self)

ksize

The ksize parameter allows you to input kmersize, default is 7, used in the kmer search.

realign

The realign parameter allows you to realign the query to the potential parents. Choices are true or false, default true.

window

The window parameter allows you to specify the window size for searching for chimeras, default=50.

mothur > chimera.slayer(fasta=ex.align, template=silva.gold align, window=400)

increment

The increment parameter allows you to specify how far you move each window while finding chimeric sequences, default=5.

mothur > chimera.slayer(fasta=ex.align, template=silva.gold align, increment=25)

match & mismatch & numwanted & parents

The match parameter allows you to reward matched bases while selecting potential parents, default is 5. The mismatch parameter allows you to penalize mismatched bases while selecting potential parents, default is -4. The numwanted parameter allows you to specify how many potential parents you would each query sequence compared with, default=15. The parents parameter allows you to select the number of potential parents to investigate from the numwanted best matches after rating them, default is 3.

minsim & mincov

The minsim parameter allows you to specify a minimum similarity between the query and the parent fragments, default=90, meaning 90%. The mincov parameter allows you to specify minimum coverage of closest matches found in template and the query. Default is 70, meaning 70%.

iters & minbs & minsnp

The iters parameter allows you to specify the number of bootstrap iters to do, default=1000. The minbs parameter allows you to specify minimum bootstrap support for calling a sequence chimeric. Default is 90, meaning 90%. The minsnp parameter allows you to specify percent of SNPs to sample on each side of breakpoint for computing bootstrap support, default=10, meaning 10%.

divergence

The divergence parameter allows you to set a cutoff for chimera determination, default is 1.007.

trim

The trim parameter allows you to output a new fasta file containing your sequences with the chimeric ones trimmed to include only their longest piece, default=F.

split

The split parameter was an idea we had to help detect tri- and quadmeras; this increases the number of false positives. When split=T, if a sequence comes back as non-chimeric, mothur will test the two sides to see if they are chimeric. By default, split=F.

removechimeras

The removechimeras parameter allow you to remove the chimeras from your files instead of just flagging them. Default=t.

dereplicate

The dereplicate parameter can be used when checking for chimeras by group. If the dereplicate parameter is false, then if one group finds the sequence to be chimeric, then all groups find it to be chimeric, default=f. If you set dereplicate=t, and then when a sequence is found to be chimeric it is removed from it’s group, not the entire dataset.

Note: When you set dereplicate=t, mothur generates a new count table with the chimeras removed and counts adjusted by sample.

For a detailed example: Dereplicate example

The name option allows you to provide a name file.

We DO NOT recommend using the name file. Instead we recommend using a count file. The count file reduces the time and resources needed to process commands. It is a smaller file and can contain group information.

The group parameter allows you to provide a group file.

We DO NOT recommend using the name / group file combination. Instead we recommend using a count file. The count file reduces the time and resources needed to process commands. It is a smaller file and can contain group information.

Revisions

  • 1.22.0 - Added the group option for use with reference=self.
  • 1.23.0 - Paralellized by group for all OS’s.
  • 1.28.0 - added count parameter.
  • 1.29.0 - added the dereplicate parameter.
  • 1.30.0 - with count file and dereplicate=t will create a *.pick.count_table file.
  • 1.30.0 - Bug Fix: dereplicate=t, remove.seqs(dups=f) was not removing all redundant chimeras.
  • 1.32.1 Bug Fix: count table and dereplicate=t caused total=0 error message. - https://forum.mothur.org/viewtopic.php?f=4&t=2620
  • 1.33.0 - improved work balance load between processors when processing by group.
  • 1.38.0 - Removes save option.
  • 1.40.0 - Removes processors option
  • 1.47.0 Adds removechimeras parameter to chimera commands to auto remove chimeras from files. #795
  • 1.47.0 Removes blast #801
  • 1.47.0 Removes search and blastlocation parameters from chimera.slayer. #801