We will be offering an R workshop December 18-20, 2019. Learn more.
The seq.error command reads a fasta file and searches for errors in sequence compared to a reference file. Using this command to assess error rate requires that your dataset includes one or more mock community samples of known composition. Error rate is defined as 1-(Sum of bases in query - Sum of mismatches to reference)/Sum of bases in query
seq.error(fasta=, count=, reference=, aligned=F)
The aligned parameter allows you to specify whether your query and reference sequences are aligned. default=TRUE.
The name parameter allows you to provide a name file associated with your fasta file, so you can include the redundant sequences in your error analysis. If you include a name file, do not also include a count file.
The count parameter allows you to provide a count file associated with your fasta file, so you can include the redundant sequences in your error analysis. If you include a count file, do not also include a name file.
seq.error runs a chimera check on the query file, based on the input reference. You have the option of ignoring probable chimeras in calculating error rate. default=TRUE.
The threshold parameter allows you to ignore distances greater than some limit of interest.
The processors parameter allows you to run the command with multiple processors. By default processors is 1, and use of multiple processors is not available for Windows users.
If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f.
Output files are:
- 1.22.0 First introduced.
- 1.30.0 Added count parameter
- 1.30.0 Bug fix: aligned=f was not degapping the sequences.