barcodes and primers only partially removed during make.contigs

Please report bugs you encounter in mothur using this forum.
Post Reply
afioredonno
Posts: 4
Joined: Wed Jan 27, 2016 11:49 am

barcodes and primers only partially removed during make.contigs

Post by afioredonno » Wed Feb 28, 2018 5:23 pm

Hello,

I don't know if it's a bug of the program or a problem with my oligo file - I've checked it and cannot find anything wrong with it. I'm running mothur V39.5 on Ubuntu 14.03 LTS, and there's enough free space on the hard disk. This is the sixt time I'm using mothur as the best pipeline ever for metabarcoding - and I can't understand what's going on. The sequences are MiSeq V3 paired ends and the length of the amplicons is 300-400. Both F and R primers were barcoded.
This is the command:
make.contigs(ffastq=Oom11F_R1.fastq, rfastq=Oom11F_R2.fastq, oligos=OligoOom.txt, pdiffs=1, bdiffs=0, processors=12, rename=T, checkorient=T)
This is the oligo file, tab-separated:
primer GCGGAAGGATCATTACCAC TCTTCATCGDTGTGCGAGC
barcode GCTTCTAG ATAGCTTG AEF001
barcode GCTTCTAG CCTTAATG AEF002
barcode GCTTCTAG CACATGCT AEF003
barcode GCTTCTAG TGTCATGC AEF004
barcode GCTTCTAG CGATAAGG AEF005
barcode GCTTCTAG TTACGCGA AEF006
barcode GCTTCTAG TGGAGCTT AEF007
barcode GCTTCTAG GATACTGC AEF008 etc. (150 samples + mock community)
After make contig (which works well, good overlap, 13 millions assembled sequences), in the fasta file all sequences have been renamed according to the sample name but not all have the barcodes and primers removed!
This one is OK, it starts after the F primer and ends before the reverse:
>1_SEF045
ACCTAAAAACTTTCCACGTGAACTGTCGTTATTTGTTGTGCGCTCTCTGCGGTGTCGGTGGCGTCTGCTGGCTTTGTTGCTGGCGGGTGCGAGCCGGATGCGGAGGCTGAACGAAGGTCGAGTTGCTTTGCTCTCGGCTGACTTATTTTTCAAACCCAATACCAAACTTACTGATTATACTGTGAGAACGAAAGTTCTTGCTTTTAACTAGATAACAACTTTCAGCAGTGGATGTCTAG
The following still has both barcodes corresponding to HEF041- first and last 8 characters ( and the F and R primers):
>1_HEF041
CATCTTGAGCGGAAGGATCATTACCACACCAAAAAACACCCCACGTGAATGTATTCTGTATGAGGCTTGTGCTGCTCTTAGGGGCGGCTAGCCGAAGGTTTCGCAAGAGACCGATGTATTTTTAATCCCTTTTATTAAATGACTGATCAAAAACTGCAGACAGAAATGTGTGCATTCAATTGAAATACAACTTTCAACAGTGGATGTCTAGGCTCGCACAACGATGAAGATACACAGT
If I run trim.seqs it doesn't fix the problem, it discards all the sequences that were cut, of course, which are the majority. I end up with 5 millions sequences only. Just let me know if I should send the oligo file and part of the fatsq files... Any idea of what's going on and any solution?
I will really appreciate your answer... I have 8 runs to analyze!
Thanks,
Anna Maria

westcott
Posts: 1729
Joined: Thu Sep 03, 2009 7:47 am

Re: barcodes and primers only partially removed during make.contigs

Post by westcott » Tue Mar 06, 2018 9:00 am

Could you send your oligos file and a set of the fastq files that contain a problem read to mothur.bugs@gmail.com so I can track down the issue for you?

westcott
Posts: 1729
Joined: Thu Sep 03, 2009 7:47 am

Re: barcodes and primers only partially removed during make.contigs

Post by westcott » Mon Mar 12, 2018 5:04 pm

Let's look at one read:

Code: Select all

@M02442:176:000000000-BL946:1:1101:10035:1896 1:N:0:ACAGTG
GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAGGCTCGCACAACGATGAAGATCGCGTAA
+
CCCCCGGGGGGGGGGGFDGFGGGGGGGGGGG@EEFGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGCCFGGGGGGGGGGGGGGGGGGGGGGGGGG>FGGGGGGGGGGGGGFGEGGGGGGGGGGGGGF9DEGFGGGGFGDGGGEGGGGGGGGGGGGGGFGGFFGGGGFFGGGGGGGGGGFGGGGGG6?FFGGGGFGBFFF9GEFFDFAFC6B<>F@AB>>?

@M02442:176:000000000-BL946:1:1101:10035:1896 2:N:0:ACAGTG
TTACGCGATCTTCATCGTTGTGCGAGCCTAGACATCCACTGCTGAAAGTTGTTATTGATTAAAACCAAAGACTTTCGTTTTCGGAGTATAGTTCTGTATTAAGTAAATGGGTTTGAAAAGTACGTCGGCAAGCCACAATAAAGCGGCTTACCTTCGTTCAGCCTTCCCGAAGGAAAGCACAGAACATAATTACAACGGTTCACGTGGAAAGTTTTTTTGGTGTGGTAATGATCCTTCCGCGCAATAGC
+
CCCCCGGGGGGGGFGGGGGGGGGGGGGGGCDDGGFGFGGGGGGGGGDFGFGFFGFFGGGGGGG<FFG<FGGGFFEGGGG?FGGCGGGGGGGGGGGGGGGGGFGFGGGGGGGGGGGF<FGGGGGEGGGGGGGFGGGGGF,EEGGGGGGGGEDEAFFGGGGGF9FGGGGGGD@CGFFFGGFGFGCGGCFFGGFGD>FGGFFFFAF>0@C?FGBAFFFFF@@>A=21@@?CCECEFF4?E;<?>B0>@EFF
primer GCGGAAGGATCATTACCAC TCTTCATCGDTGTGCGAGC
barcode GCTATTGC TTACGCGA HEF005

@M02442:176:000000000-BL946:1:1101:10035:1896 1:N:0:ACAGTG
GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGC...

@M02442:176:000000000-BL946:1:1101:10035:1896 2:N:0:ACAGTG
TTACGCGATCTTCATCGTTGTGCGAGCCTAGACATCCACTGCTGAAAGTTGTTATTGATTAAAACCAAAG....

R2 is flipped, and the two fragments are aligned, resulting in:

Code: Select all

@M02442:176:000000000-BL946:1:1101:10035:1896 1:N:0:ACAGTG
...........................ACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAGGCTCGCACAACGATGAAGATCGCGTAA

@M02442:176:000000000-BL946:1:1101:10035:1896 2:N:0:ACAGTG
GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAG---------------------------
The reads are then assembled, resulting in:

GCTATTGCGCGGAAGGATCATTACCACACCAAAAAAACTTTCCACGTGAACCGTTGTAATTATGTTCTGTGCTTTCCTTCGGGAAGGCTGAACGAAGGTAAGCCGCTTTATTGTGGCTTGCCGACGTACTTTTCAAACCCATTTACTTAATACAGAACTATACTCCGAAAACGAAAGTCTTTGGTTTTAATCAATAACAACTTTCAGCAGTGGATGTCTAGGCTCGCACAACGATGAAGATCGCGTAA

I can see barcodes and primers on the assembled read, but I think it's related to your dataset. Pat, what are your thoughts?

pschloss
Site Admin
Posts: 3149
Joined: Wed Sep 02, 2009 3:40 pm
Location: University of Michigan
Contact:

Re: barcodes and primers only partially removed during make.contigs

Post by pschloss » Tue Mar 13, 2018 10:34 am

That looks right to me - do these barcodes look familiar, Anna Maria?

afioredonno
Posts: 4
Joined: Wed Jan 27, 2016 11:49 am

Re: barcodes and primers only partially removed during make.contigs

Post by afioredonno » Tue Mar 13, 2018 12:02 pm

It appears that I have to clarify my request: the barcodes that you have identified are perfectly right, they correspond to the file I gave. Mothur assembles the reads, recognizes the primers and the barcodes and rename the sequences according to the oligo file.

The problem is that after this step, the barcodes and the primers should be removed from the sequences.
This happens ONLY FOR SOME OF THE SEQUENCES - apparently, at least 5 millions do not have the barcodes and the primers removed (checked by running trim.seqs).
This is what I can't explain.

Do you get also this weird mixture when using make.contigs with the files I sent you (R1 and R2 fastq and oligo file)?
Thank you for your attention,
Anna Maria

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest