We will be offering mothur and R workshops throughout 2019. Learn more.

Difference between revisions of "Chimera.bellerophon"

From mothur
Jump to: navigation, search
(Created page with 'Use Bellerophon approach to create a sorted priority score list of potentially chimeric sequences. __TOC__ ==Algorithm== # If filter<nowiki>==</nowiki>true, apply a 50% soft…')
 
Line 79: Line 79:
 
By default the filter parameter is set to false, but if you set it to true a 50% soft filter will be applied.
 
By default the filter parameter is set to false, but if you set it to true a 50% soft filter will be applied.
  
  mothur > chimera.seqs(fasta=ex.align, filter=t)
+
  mothur > chimera.bellerophon(fasta=ex.align, filter=t)
  
  
Line 85: Line 85:
 
By default the window parameter is set to 1/4 length of seq, but if you set it up to half the sequence length.
 
By default the window parameter is set to 1/4 length of seq, but if you set it up to half the sequence length.
  
  mothur > chimera.seqs(fasta=ex.align, window=200)
+
  mothur > chimera.bellerophon(fasta=ex.align, window=200)
  
 
===increment===
 
===increment===
 
The increment parameter determines how far the 2 windows "slide" each iteration. By default the increment parameter is 25, but if you may set it up to sequence length minus twice the window.
 
The increment parameter determines how far the 2 windows "slide" each iteration. By default the increment parameter is 25, but if you may set it up to sequence length minus twice the window.
  
  mothur > chimera.seqs(fasta=ex.align, method=bellerophon, increment=100)
+
  mothur > chimera.bellerophon(fasta=ex.align, increment=100)
  
 
===correction===
 
===correction===
 
By default the correction parameter is set to true, meaning the the square root of the distances is used instead of the distance value.  
 
By default the correction parameter is set to true, meaning the the square root of the distances is used instead of the distance value.  
  
  mothur > chimera.seqs(fasta=ex.align, method=bellerophon, correction=f)
+
  mothur > chimera.bellerophon(fasta=ex.align, correction=f)
 +
 
 +
===processors===
 +
By default the processors parameter is set to 1, if you are running the mpi-enabled version processors is set to the number of processes you have running.
 +
 
 +
mothur > chimera.bellerophon(fasta=ex.align, processors=2)
 +
 
 +
 
 +
[[Category:Commands]]

Revision as of 14:30, 20 April 2010

Use Bellerophon approach to create a sorted priority score list of potentially chimeric sequences.


Algorithm

  1. If filter==true, apply a 50% soft filter and generate a filter.align file [this may require pulling filter methods into their own classes]; otherwise no filtering (the Bellerophon server does 50% filtering)
  2. Read sequences into ram as a vector of Sequence objects.
  3. Find the average midpoint of all sequences in the alignment.
  4. Define “left” as positions [1-midpoint] and right as [midpoint-end]
  5. Generate a “Preference” structure with string (sequence name) and float (preference score) entries
  6. Generate “vector<Preference> preferences(n)” where n corresponds to the number of sequences and the sequence name is stored with each accession.
  7. Calculate (use preference instead of col):
    <math>col\left[i\right]=\sum_j^{N}\left|dm^{left}\left[i\right]\left[j\right]-dm^{right}\left[i\right]\left[j\right]\right|</math>
    Where i is the sequence you are on and j is all the other sequences. If correction=T, then dm=sqrt(distance); if correction=F, then don’t transform the distances. The distance calculator should be “eachgap”. This step should be parallelized.
  8. Sum across all preferences[i] to get dme
  9. Recalculate each preferences[i] value as:
    <math>{preference}\left[i\right]=\frac{dme}{dme-2 * col\left[i\right]}</math>
  10. Sort the preferences values from high to low.
  11. Output the sorted list to *.chimera as well as the accession id for the closest sequence on the left and right
  12. Output to the screen:
    • average number of letters on either side of midpoint
    • number of sequences with a preference score above 1.000
    • min, 2.5 percentile, 25 percentile, median, 75 percentile, 97.5 percentile and max preference scores (see output of summary.seqs)
    • sequence with a preference score above the 95 percentile are reported as chimeric.


Default settings

The only required parameter is fasta. The default settings for chimera.bellerophon are filter=F, window=1/4 length of seq, increment=25, correction=T, processors=1.

mothur > chimera.bellerophon(fasta=ex.align)

The output to the screen looks like:

Reading sequences from ex.align...Done.
Processing sliding window: 10
Processing sliding window: 20
Processing sliding window: 30
Processing sliding window: 40
Processing sliding window: 50
Processing sliding window: 60
Processing sliding window: 70
Processing sliding window: 80
Processing sliding window: 90
Processing sliding window: 100
Processing sliding window: 110
Processing sliding window: 120
Processing sliding window: 130
Processing sliding window: 140
Processing sliding window: 150
Processing sliding window: 153
gi|11093939|MNB2|AF293011 is a suspected chimera at breakpoint 2195
It's score is 1.31537 with suspected left parent gi|11093938|MNC2|AF293010 and right parent gi|11093938|MNC2|AF293010
Sequence with preference score above 1.31537: 1
Minimum:	0.635111
2.5%-tile:	0.635111
25%-tile:	0.866302
Median: 	1.00136
75%-tile:	1.13515
97.5%-tile:	1.31537
Maximum:	1.31537


Opening ex.bellerophon.chimera you would see:

Name	Score	Left	Right
gi|11093939|MNB2|AF293011	1.31537	gi|11093938|MNC2|AF293010	gi|11093938|MNC2|AF293010
gi|11093930|MNH4|AF293002	1.29437	gi|11093925|MNG7|AF292997	gi|11093927|MND8|AF292999
gi|11093937|MNF2|AF293009	1.23542	gi|11093929|MNC12|AF293001	gi|11093927|MND8|AF292999
gi|11093926|MNH2|AF292998	1.21045	gi|11093924|MNF4|AF292996	gi|11093925|MNG7|AF292997
...

Options

filter

By default the filter parameter is set to false, but if you set it to true a 50% soft filter will be applied.

mothur > chimera.bellerophon(fasta=ex.align, filter=t)


window

By default the window parameter is set to 1/4 length of seq, but if you set it up to half the sequence length.

mothur > chimera.bellerophon(fasta=ex.align, window=200)

increment

The increment parameter determines how far the 2 windows "slide" each iteration. By default the increment parameter is 25, but if you may set it up to sequence length minus twice the window.

mothur > chimera.bellerophon(fasta=ex.align, increment=100)

correction

By default the correction parameter is set to true, meaning the the square root of the distances is used instead of the distance value.

mothur > chimera.bellerophon(fasta=ex.align, correction=f)

processors

By default the processors parameter is set to 1, if you are running the mpi-enabled version processors is set to the number of processes you have running.

mothur > chimera.bellerophon(fasta=ex.align, processors=2)