pca timeout

Please report bugs you encounter in mothur using this forum.
Post Reply
Zak38
Posts: 3
Joined: Tue Dec 12, 2017 6:51 pm

pca timeout

Post by Zak38 » Tue Dec 12, 2017 7:01 pm

I am trying the pca command on 4 different data sets. Each data set contains 14 groups. There are two data sets using v4 illumina tag sequences and two using the v6 illumina tag sequences. Each of the two (v4 and v6) data sets are 10,000 sequences per group and 230,000 sequences per group. When I run pca for the 10,000 sequences, both the v4 and v6 data sets work well. However, at the 230,000 sequences level, only the v4 set works. The v6 run has timed out twice now. Once after 5 days and once after 10 days while the v4 set only took a few hours. This is completely opposite to everything else I have run in mothur where the v4 runs are consistently longer than the v6 runs due to the sequences averaging about 250 bp vice the 60ish bp of the v6 tag. Is there an algorithm reason this is occurring (more difficulty with shorter reads?), or some other problem I haven't found yet (typo in my batch file)?

Thanks,
Zak

pschloss
Site Admin
Posts: 3137
Joined: Wed Sep 02, 2009 3:40 pm
Location: University of Michigan
Contact:

Re: pca timeout

Post by pschloss » Mon Dec 18, 2017 10:00 am

I would strongly discourage the use of PCA. Instead calculate a distance matrix using dist.shared with something like Bray-Curtis or ThetaYC and run that distance matrix through PCoA. There is an example of this in the MiSeq SOP wiki page.

Pat

Zak38
Posts: 3
Joined: Tue Dec 12, 2017 6:51 pm

Re: pca timeout

Post by Zak38 » Wed Dec 20, 2017 1:52 pm

Thanks Pat. I will try that. Could you be a little more specific as to why you discourage PCA?

Zak

pschloss
Site Admin
Posts: 3137
Joined: Wed Sep 02, 2009 3:40 pm
Location: University of Michigan
Contact:

Re: pca timeout

Post by pschloss » Thu Dec 21, 2017 9:11 am

PCA essentially uses R2 as a distance between samples. This weights double zeros the same as double ones. In other words, if an OTU is missing from the two samples being compared, it will inflate the similarity between samples. Other metrics that are widely used in ecology (e.g. Bray-Curits and ThetaYC ignore these double zeros). PCA is more appropriate for comparing communities based on their metadata.

Pat

Zak38
Posts: 3
Joined: Tue Dec 12, 2017 6:51 pm

Re: pca timeout

Post by Zak38 » Thu Dec 21, 2017 10:26 am

Thank you very much for that, Pat.

Zak

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest