We will be offering mothur and R workshops throughout 2019. Learn more.

Chimera.pintail

From mothur
Revision as of 14:35, 20 April 2010 by Westcott (Talk | contribs) (Created page with 'Use Pintail approach .... __TOC__ ==Algorithm== ==Default settings== The fasta and template parameters are required. mothur > chimera.seqs(fasta=ex.align, template=core_…')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Use Pintail approach ....


Algorithm

Default settings

The fasta and template parameters are required.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta, method=pintail)

The output to the screen looks like:

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta, method=pintail)
Reading sequences and template file... Done.
Finding closest sequence in template to each sequence... Done.
Getting conservation... Calculating probability of conservation for your template sequences.
This can take a while...  I will  output the frequency of the highest base in each position 
to a .freq file so that you can input them using the conservation parameter next time you run this command.
Providing the .freq file will improve speed.    Done.
Finding window breaks... Done.
Calculating observed distance... Done.
Finding variability... Done.
Calculating alpha... Done.
Calculating expected distance... Done.
Finding deviation... Done.
Calculating quantiles for your template.  This can take a while...  
I will output the quantiles to a .quan file that you can input them using the quantile
parameter next time you run this command. 
Providing the .quan file will dramatically improve speed.    
Processing sequence 0
Processing sequence 1
Processing sequence 2
Processing sequence 3
Processing sequence 4
Processing sequence 5
...
...
Processing sequence 4936
Processing sequence 4937
Done.
gi|11093940|MNF8|AF293012	div: 10.5368	stDev: 2.87809	chimera flag: Yes
gi|11093937|MNF2|AF293009	div: 11.3816	stDev: 3.34803	chimera flag: Yes
gi|11093930|MNH4|AF293002	div: 17.3625	stDev: 4.33016	chimera flag: Yes


Opening ex.pintail.chimeras you would see:

gi|11093941|MNA3|AF293013	div: 10.5578	stDev: 2.739	chimera flag: No
Observed	1.66667	1	0.666667	0.666667	0.666667	0.666667 ...	
Expected	2.23191	2.3867	2.42569	2.44031	2.48789	2.43876	2.43876	2.43933	...
gi|11093940|MNF8|AF293012	div: 10.5368	stDev: 2.87809	chimera flag: Yes
Observed	1.66667	1	0.666667	0.666667	0.666667	1       1 ...	
Expected	2.22692	2.38135	2.42026	2.43484	2.48233	2.4333	2.4333	...
...

Options

conservation

You can upload a file containing the frequency information for your template file to increase speed. Mothur will generate this for you but it takes a long time.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta, method=pintail,
conservation=core_set_aligned.imputed.freq)

quantile

You can upload a file containing the quantiles information for your template file to increase speed. Mothur can generate this for you but it takes a VERY long time. Note that when you use the filter, mask or mask and filter you need to select the appropriate quantile file.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, quantile=core_set_aligned.imputed.pintail.quan)

filter

By default the filter parameter is set to false, but if you set it to true a 50% soft vertical filter will be applied.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.quan)

With the filter...

gi|11093941|MNA3|AF293013	div: 6.57051	stDev: 4.66118	chimera flag: Yes
gi|11093940|MNF8|AF293012	div: 6.41026	stDev: 4.87026	chimera flag: Yes
gi|11093939|MNB2|AF293011	div: 7.3718	stDev: 5.37411	chimera flag: Yes
gi|11093937|MNF2|AF293009	div: 6.46965	stDev: 5.58825	chimera flag: Yes
gi|11093930|MNH4|AF293002	div: 9.31174	stDev: 4.0618	chimera flag: Yes


mask

By default there is no mask applied, but you can set it to a file containing your mask or mask=default will apply the ecoli mask.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, mask=default, quantile=core_set_aligned.imputed.pintail.masked.quan)

With the ecoli mask...

gi|11093941|MNA3|AF293013	div: 9.66184	stDev: 7.37292	chimera flag: Yes
gi|11093940|MNF8|AF293012	div: 9.10973	stDev: 7.06876	chimera flag: Yes
gi|11093939|MNB2|AF293011	div: 9.4548	stDev: 5.99675	chimera flag: Yes
gi|11093937|MNF2|AF293009	div: 10.4336	stDev: 6.98824	chimera flag: Yes

With the ecoli mask and filter applied...

gi|11093941|MNA3|AF293013	div: 6.58106	stDev: 5.81978	chimera flag: Yes
gi|11093940|MNF8|AF293012	div: 6.42055	stDev: 5.91301	chimera flag: Yes
gi|11093939|MNB2|AF293011	div: 7.38363	stDev: 4.57108	chimera flag: Yes
gi|11093937|MNF2|AF293009	div: 6.47482	stDev: 4.74574	chimera flag: Yes
gi|11093930|MNH4|AF293002	div: 9.31929	stDev: 4.54857	chimera flag: Yes


mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, mask=default, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.masked.quan)

window

The window parameter is used to determine the length of sequence you want in each window analyzed. By default it is set to 300. Note, changing the window size will require new quantile files to be made.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, window=200)

increment

The increment parameter is used to slide the window along the sequence. For the pintail algorithm the default is 25. Note, changing the increment will require new quantile files to be made.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, increment=50)

processors

To speed up your processing the chimera.seqs command can be run with multiple processors by using the processors parameter. By default the processors parameter is 1.

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, quantile=core_set_aligned.imputed.pintail.quan, processors=2)


This method was written using the algorithms described in the "At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies" paper by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1.