We will be offering mothur and R workshops throughout 2019. Learn more.

Difference between revisions of "Get.seqs"

From mothur
Jump to: navigation, search
(Algorithm)
Line 1: Line 1:
Given a list of accession numbers (i.e. sequence names) and one or more file formats, generate a new file that contains only those sequences.
+
The [[get.seqs]] command takes a list of sequence names and either a [[fasta file | fasta]], [[name file | name]], [[group file | group]], or [[align.report file]] to generate a new file that contains only the sequences in the list.  This command may be used in conjunction with the [[list.seqs]] command to help screen a sequence collection.  To complete this analysis, you need to download the folder compressed in the [[Media:Esophagus.zip | Esophagus.zip]] archive.
  
 
__NOTOC__
 
__NOTOC__
  
 
==Options==
 
==Options==
accnos, fasta, name, group, alignreport; each takes a file name
+
To run get.seqs, you must provide the accnos option and at least one other option.  The command will generate a *.pick.* file.
  
 +
===accnos option===
 +
To generate an accnos file, you could use the [[list.seqs]] command.  Here you should generate a text file containing the following lines:
  
==Required==
+
59_10_1
accnos and one of fasta/name/group/alignreport
+
59_10_10
 +
59_10_11
  
 +
Save the file as esophagus.accnos
  
==Output==
 
<nowiki>*.pick.*</nowiki>
 
  
 +
===fasta option===
 +
To use the fasta option, follow this example:
  
==Algorithm==
+
mothur > get.seqs(accnos=esophagus.accnos, fasta=esophagus.fasta)
  
# read accnos file into a set<string> container, close the file
+
This generates the file esophagus.pick.fasta, which contains the following lines:
# read through the file to be parsed and for each entry, if the sequence name is in the set<string> container:
+
 
** spit the data out to the new file and delete the entry from the set<string> container
+
>59_10_1
** otherwise do nothing
+
TGCAAGTCGAACGATGAAGCCTAGCTTG...
 +
>59_10_10
 +
TGCAAGTAGAACGCTGAAGAGAGGAGCT...
 +
>59_10_11
 +
TGCAAGTCGAACGAAACTTTCTTACACC...
 +
 
 +
 
 +
===name option===
 +
To use the name option, follow this example (assuming you have used [[unique.seqs]] on esophagus.fasta):
 +
 
 +
mothur > get.seqs(accnos=esophagus.accnos, name=esophagus.names)
 +
 
 +
This generates the file esophagus.pick.names, which contains the following lines:
 +
 
 +
59_10_10        59_10_10
 +
59_10_11        59_10_11
 +
59_10_1 59_10_1
 +
 
 +
 
 +
===group option===
 +
To use the group option, follow this example:
 +
 
 +
mothur > get.seqs(accnos=esophagus.accnos, group=esophagus.groups)
 +
 
 +
This generates the file esophagus.pick.groups, which contains the following lines:
 +
 
 +
59_10_1 C
 +
59_10_10 C
 +
59_10_11 C
 +
 
 +
 
 +
=== alignreport option===
 +
 
 +
To use the alignreport option, follow this example:
 +
 
 +
mothur > get.seqs(accnos=esophagus.accnos, alignreport=esophagus.align.report)
 +
 
 +
This generates the file esophagus.pick.align.report, which contains the following lines:
 +
 
 +
QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template
 +
59_10_1 869 176825 1527 kmer 69.84 needleman 1 869 5914 870 1 6 1 93.79
 +
59_10_10 868 196718 1542 kmer 78.05 needleman 1 868 49 916 870 2 2 0 95.29
 +
59_10_11 870 97946 1560 kmer 92.12 needleman 1 870 51 920 870 0 0 0 99.08

Revision as of 13:40, 30 July 2009

The get.seqs command takes a list of sequence names and either a fasta, name, group, or align.report file to generate a new file that contains only the sequences in the list. This command may be used in conjunction with the list.seqs command to help screen a sequence collection. To complete this analysis, you need to download the folder compressed in the Esophagus.zip archive.


Options

To run get.seqs, you must provide the accnos option and at least one other option. The command will generate a *.pick.* file.

accnos option

To generate an accnos file, you could use the list.seqs command. Here you should generate a text file containing the following lines:

59_10_1
59_10_10
59_10_11

Save the file as esophagus.accnos


fasta option

To use the fasta option, follow this example:

mothur > get.seqs(accnos=esophagus.accnos, fasta=esophagus.fasta)

This generates the file esophagus.pick.fasta, which contains the following lines:

>59_10_1
TGCAAGTCGAACGATGAAGCCTAGCTTG...
>59_10_10
TGCAAGTAGAACGCTGAAGAGAGGAGCT...
>59_10_11
TGCAAGTCGAACGAAACTTTCTTACACC...


name option

To use the name option, follow this example (assuming you have used unique.seqs on esophagus.fasta):

mothur > get.seqs(accnos=esophagus.accnos, name=esophagus.names)

This generates the file esophagus.pick.names, which contains the following lines:

59_10_10        59_10_10
59_10_11        59_10_11
59_10_1 59_10_1


group option

To use the group option, follow this example:

mothur > get.seqs(accnos=esophagus.accnos, group=esophagus.groups)

This generates the file esophagus.pick.groups, which contains the following lines:

59_10_1	C
59_10_10	C
59_10_11	C


alignreport option

To use the alignreport option, follow this example:

mothur > get.seqs(accnos=esophagus.accnos, alignreport=esophagus.align.report)

This generates the file esophagus.pick.align.report, which contains the following lines:

QueryName	QueryLength	TemplateName	TemplateLength	SearchMethod	SearchScore	AlignmentMethod	QueryStart	QueryEnd	TemplateStart	 TemplateEnd	PairwiseAlignmentLength	GapsInQuery	GapsInTemplate	LongestInsert	SimBtwnQuery&Template	
59_10_1	869	176825	1527	kmer	69.84	needleman	1	869	5914	870	1	6	1	93.79	
59_10_10	868	196718	1542	kmer	78.05	needleman	1	868	49	916	870	2	2	0	95.29	
59_10_11	870	97946	1560	kmer	92.12	needleman	1	870	51	920	870	0	0	0	99.08