We will be offering mothur and R workshops throughout 2019. Learn more.

Difference between revisions of "Chimera.pintail"

From mothur
Jump to: navigation, search
(Created page with 'Use Pintail approach .... __TOC__ ==Algorithm== ==Default settings== The fasta and template parameters are required. mothur > chimera.seqs(fasta=ex.align, template=core_…')
 
Line 10: Line 10:
 
The fasta and template parameters are required.  
 
The fasta and template parameters are required.  
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta, method=pintail)
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta)
  
 
The output to the screen looks like:
 
The output to the screen looks like:
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta, method=pintail)
+
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta)
 
  Reading sequences and template file... Done.
 
  Reading sequences and template file... Done.
 
  Finding closest sequence in template to each sequence... Done.
 
  Finding closest sequence in template to each sequence... Done.
Line 42: Line 42:
 
  Processing sequence 4937
 
  Processing sequence 4937
 
  Done.
 
  Done.
  gi|11093940|MNF8|AF293012 div: 10.5368 stDev: 2.87809 chimera flag: Yes
+
  gi|11093941|MNA3|AF293013 div: 29.9873 stDev: 6.01747 chimera flag: Yes
  gi|11093937|MNF2|AF293009 div: 11.3816 stDev: 3.34803 chimera flag: Yes
+
  gi|11093935|MNC9|AF293007 div: 23.9793 stDev: 4.81477 chimera flag: Yes
  gi|11093930|MNH4|AF293002 div: 17.3625 stDev: 4.33016 chimera flag: Yes
+
gi|11093933|MNA5|AF293005 div: 23.8932 stDev: 4.81141 chimera flag: Yes
 +
  gi|11093930|MNH4|AF293002 div: 32.3988 stDev: 7.03789 chimera flag: Yes
 +
gi|11093924|MNF4|AF292996 div: 26.593 stDev: 5.26257 chimera flag: Yes
  
  
 
Opening ex.pintail.chimeras you would see:
 
Opening ex.pintail.chimeras you would see:
  
  gi|11093941|MNA3|AF293013 div: 10.5578 stDev: 2.739 chimera flag: No
+
  gi|11093941|MNA3|AF293013 div: 29.9873 stDev: 6.01747 chimera flag: Yes
  Observed 1.66667 1 0.666667 0.666667 0.666667 0.666667 ...
+
  Observed 9.66667 10.6667 12 10.6667 12.3333 12.3333 11.3333 11 ...
  Expected 2.23191 2.3867 2.42569 2.44031 2.48789 2.43876 2.43876 2.43933 ...
+
  Expected 6.72447 7.19081 7.3083 7.35233 7.49571 7.34767 7.34767 7.3494 ...
  gi|11093940|MNF8|AF293012 div: 10.5368 stDev: 2.87809 chimera flag: Yes
+
  gi|11093940|MNF8|AF293012 div: 31.3688 stDev: 5.98965 chimera flag: No
  Observed 1.66667 1 0.666667 0.666667 0.666667 1      1 ...
+
  Observed 9.66667 10.6667 12 10.6667 12.3333 12.3333 11.3333 11 ...
  Expected 2.22692 2.38135 2.42026 2.43484 2.48233 2.4333 2.4333 ...
+
  Expected 7.03297 7.5207 7.64357 7.68963 7.83958 7.68475 7.68475 7.68656 ...
 
  ...
 
  ...
 +
  
 
==Options==
 
==Options==
Line 62: Line 65:
 
You can upload a file containing the frequency information for your template file to increase speed. Mothur will generate this for you but it takes a long time.
 
You can upload a file containing the frequency information for your template file to increase speed. Mothur will generate this for you but it takes a long time.
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta, method=pintail,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
 
  conservation=core_set_aligned.imputed.freq)
 
  conservation=core_set_aligned.imputed.freq)
  
 
===quantile===
 
===quantile===
You can upload a file containing the quantiles information for your template file to increase speed.  Mothur can generate this for you but it takes a VERY long time.  Note that when you use the filter, mask or mask and filter you need to select the appropriate quantile file.
+
You can upload a file containing the quantiles information for your template file to increase speed.  Mothur can generate this for you but it takes a VERY long time.  Note that when you use the filter, mask or mask and filter you need to select the appropriate quantile file.  The filter parameter makes the quantile file generated specific to the query set you are analyzing.  
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, quantile=core_set_aligned.imputed.pintail.quan)
+
  quantile=core_set_aligned.imputed.pintail.quan)
  
 
===filter===
 
===filter===
By default the filter parameter is set to false, but if you set it to true a 50% soft vertical filter will be applied.
+
By default the filter parameter is set to false, but if you set it to true a 50% soft vertical filter will be applied. Filtering makes the quantile file specific to the query set you are analyzing. 
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta, filter=t,
method=pintail, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.quan)
+
    conservation=core_set_aligned.imputed.freq, quantile=core_set_aligned.imputed.pintail.filtered.ex.quan)
  
 
With the filter...
 
With the filter...
gi|11093941|MNA3|AF293013 div: 6.57051 stDev: 4.66118 chimera flag: Yes
+
  gi|11093937|MNF2|AF293009 div: 16.3738 stDev: 5.44241 chimera flag: Yes
gi|11093940|MNF8|AF293012 div: 6.41026 stDev: 4.87026 chimera flag: Yes
+
  gi|11093934|MND1|AF293006 div: 15.6675 stDev: 5.29495 chimera flag: Yes
gi|11093939|MNB2|AF293011 div: 7.3718 stDev: 5.37411 chimera flag: Yes
+
  gi|11093937|MNF2|AF293009 div: 6.46965 stDev: 5.58825 chimera flag: Yes
+
  gi|11093930|MNH4|AF293002 div: 9.31174 stDev: 4.0618 chimera flag: Yes
+
 
+
  
 
===mask===
 
===mask===
 
By default there is no mask applied, but you can set it to a file containing your mask or mask=default will apply the ecoli mask.
 
By default there is no mask applied, but you can set it to a file containing your mask or mask=default will apply the ecoli mask.
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta, filter=t, mask=default,
method=pintail, mask=default, quantile=core_set_aligned.imputed.pintail.masked.quan)
+
    conservation=core_set_aligned.imputed.freq, quantile=core_set_aligned.imputed.pintail.filtered.ex.masked.quan)
 
+
With the ecoli mask...
+
gi|11093941|MNA3|AF293013 div: 9.66184 stDev: 7.37292 chimera flag: Yes
+
gi|11093940|MNF8|AF293012 div: 9.10973 stDev: 7.06876 chimera flag: Yes
+
gi|11093939|MNB2|AF293011 div: 9.4548 stDev: 5.99675 chimera flag: Yes
+
gi|11093937|MNF2|AF293009 div: 10.4336 stDev: 6.98824 chimera flag: Yes
+
  
 
With the ecoli mask and filter applied...
 
With the ecoli mask and filter applied...
  
  gi|11093941|MNA3|AF293013 div: 6.58106 stDev: 5.81978 chimera flag: Yes
+
  gi|11093934|MND1|AF293006 div: 15.6675 stDev: 4.26498 chimera flag: Yes
gi|11093940|MNF8|AF293012 div: 6.42055 stDev: 5.91301 chimera flag: Yes
+
gi|11093939|MNB2|AF293011 div: 7.38363 stDev: 4.57108 chimera flag: Yes
+
gi|11093937|MNF2|AF293009 div: 6.47482 stDev: 4.74574 chimera flag: Yes
+
gi|11093930|MNH4|AF293002 div: 9.31929 stDev: 4.54857 chimera flag: Yes
+
  
 
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
 
  method=pintail, mask=default, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.masked.quan)
 
  method=pintail, mask=default, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.masked.quan)
  
Line 112: Line 100:
 
The window parameter is used to determine the length of sequence you want in each window analyzed. By default it is set to 300. Note, changing the window size will require new quantile files to be made.
 
The window parameter is used to determine the length of sequence you want in each window analyzed. By default it is set to 300. Note, changing the window size will require new quantile files to be made.
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
 
  method=pintail, window=200)
 
  method=pintail, window=200)
  
Line 118: Line 106:
 
The increment parameter is used to slide the window along the sequence.  For the pintail algorithm the default is 25. Note, changing the increment will require new quantile files to be made.
 
The increment parameter is used to slide the window along the sequence.  For the pintail algorithm the default is 25. Note, changing the increment will require new quantile files to be made.
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
 
  method=pintail, increment=50)
 
  method=pintail, increment=50)
  
 
===processors===
 
===processors===
To speed up your processing the chimera.seqs command can be run with multiple processors by using the processors parameter. By default the processors parameter is 1.
+
To speed up your processing the chimera.seqs command can be run with multiple processors by using the processors parameter. By default the processors parameter is 1.  If you are using the mpi-enabled version, processors is set to the number of processors you have running.
  
  mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta,
+
  mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
 
  method=pintail, quantile=core_set_aligned.imputed.pintail.quan, processors=2)
 
  method=pintail, quantile=core_set_aligned.imputed.pintail.quan, processors=2)
  
Line 131: Line 119:
 
"At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies" paper  
 
"At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies" paper  
 
by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1.
 
by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1.
 +
 +
 +
[[Category:Commands]]

Revision as of 14:52, 20 April 2010

Use Pintail approach ....


Algorithm

Default settings

The fasta and template parameters are required.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta)

The output to the screen looks like:

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta)
Reading sequences and template file... Done.
Finding closest sequence in template to each sequence... Done.
Getting conservation... Calculating probability of conservation for your template sequences.
This can take a while...  I will  output the frequency of the highest base in each position 
to a .freq file so that you can input them using the conservation parameter next time you run this command.
Providing the .freq file will improve speed.    Done.
Finding window breaks... Done.
Calculating observed distance... Done.
Finding variability... Done.
Calculating alpha... Done.
Calculating expected distance... Done.
Finding deviation... Done.
Calculating quantiles for your template.  This can take a while...  
I will output the quantiles to a .quan file that you can input them using the quantile
parameter next time you run this command. 
Providing the .quan file will dramatically improve speed.    
Processing sequence 0
Processing sequence 1
Processing sequence 2
Processing sequence 3
Processing sequence 4
Processing sequence 5
...
...
Processing sequence 4936
Processing sequence 4937
Done.
gi|11093941|MNA3|AF293013	div: 29.9873	stDev: 6.01747	chimera flag: Yes
gi|11093935|MNC9|AF293007	div: 23.9793	stDev: 4.81477	chimera flag: Yes
gi|11093933|MNA5|AF293005	div: 23.8932	stDev: 4.81141	chimera flag: Yes
gi|11093930|MNH4|AF293002	div: 32.3988	stDev: 7.03789	chimera flag: Yes
gi|11093924|MNF4|AF292996	div: 26.593	stDev: 5.26257	chimera flag: Yes


Opening ex.pintail.chimeras you would see:

gi|11093941|MNA3|AF293013	div: 29.9873	stDev: 6.01747	chimera flag: Yes
Observed	9.66667	10.6667	12	10.6667	12.3333	12.3333	11.3333	11 ...	
Expected	6.72447	7.19081	7.3083	7.35233	7.49571	7.34767	7.34767	7.3494 ...	
gi|11093940|MNF8|AF293012	div: 31.3688	stDev: 5.98965	chimera flag: No
Observed	9.66667	10.6667	12	10.6667	12.3333	12.3333	11.3333	11 ...	
Expected	7.03297	7.5207	7.64357	7.68963	7.83958	7.68475	7.68475	7.68656	...
...


Options

conservation

You can upload a file containing the frequency information for your template file to increase speed. Mothur will generate this for you but it takes a long time.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
conservation=core_set_aligned.imputed.freq)

quantile

You can upload a file containing the quantiles information for your template file to increase speed. Mothur can generate this for you but it takes a VERY long time. Note that when you use the filter, mask or mask and filter you need to select the appropriate quantile file. The filter parameter makes the quantile file generated specific to the query set you are analyzing.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
 quantile=core_set_aligned.imputed.pintail.quan)

filter

By default the filter parameter is set to false, but if you set it to true a 50% soft vertical filter will be applied. Filtering makes the quantile file specific to the query set you are analyzing.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta, filter=t, 
    conservation=core_set_aligned.imputed.freq, quantile=core_set_aligned.imputed.pintail.filtered.ex.quan)

With the filter...

gi|11093937|MNF2|AF293009	div: 16.3738	stDev: 5.44241	chimera flag: Yes
gi|11093934|MND1|AF293006	div: 15.6675	stDev: 5.29495	chimera flag: Yes

mask

By default there is no mask applied, but you can set it to a file containing your mask or mask=default will apply the ecoli mask.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta, filter=t, mask=default, 
    conservation=core_set_aligned.imputed.freq, quantile=core_set_aligned.imputed.pintail.filtered.ex.masked.quan)

With the ecoli mask and filter applied...

gi|11093934|MND1|AF293006	div: 15.6675	stDev: 4.26498	chimera flag: Yes
mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, mask=default, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.masked.quan)

window

The window parameter is used to determine the length of sequence you want in each window analyzed. By default it is set to 300. Note, changing the window size will require new quantile files to be made.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, window=200)

increment

The increment parameter is used to slide the window along the sequence. For the pintail algorithm the default is 25. Note, changing the increment will require new quantile files to be made.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, increment=50)

processors

To speed up your processing the chimera.seqs command can be run with multiple processors by using the processors parameter. By default the processors parameter is 1. If you are using the mpi-enabled version, processors is set to the number of processors you have running.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, quantile=core_set_aligned.imputed.pintail.quan, processors=2)


This method was written using the algorithms described in the "At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies" paper by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1.