Subsampling 3 different groups at same level

This forum is designed to help people better understand the theory behind the various commands and options available within mothur
Post Reply
walker92
Posts: 11
Joined: Wed Dec 28, 2016 8:04 am

Subsampling 3 different groups at same level

Post by walker92 » Fri Apr 20, 2018 1:31 pm

I'm hoping to get an opinion on subsampling.
I have three sources of digesta that I've run separately. They were all subsampled to 11000 like in this example: sub.sample(shared=colon.an.shared, size=11000).

They could have been subsampled like this, based on the lowest reasonable number of sequences I have to deal with:
sub.sample(shared=colon.an.shared, size=11000)
sub.sample(shared=small.an.shared, size=9898)
sub.sample(shared=cecum.an.shared, size=8751)

My assumption was to subsample to one level (11000) even though I'm not currently comparing between digesta samples (mostly because my runs get killed on my HPCC when I try to run all samples instead of three groups).

I do lose some mice from the analysis when I subsample to 11000.

Should I subsample to 11000, 9898, and 8751 and not lose mice from the analysis? Or should I subsample all to 11000 to keep consistency?

Thanks!

pschloss
Site Admin
Posts: 3149
Joined: Wed Sep 02, 2009 3:40 pm
Location: University of Michigan
Contact:

Re: Subsampling 3 different groups at same level

Post by pschloss » Mon Apr 23, 2018 9:32 am

I'm not 100% clear on your question, but 8571 sequences per mouse is a lot. If I had the choice between 8571 and 11000 sequences and a few samples, I'd go with 8571 and use all of my samples.

Pat

walker92
Posts: 11
Joined: Wed Dec 28, 2016 8:04 am

Re: Subsampling 3 different groups at same level

Post by walker92 » Tue Apr 24, 2018 7:52 am

Hi Pat,
Thanks for the reply. Let me clarify. I have three different digesta sources: small intestine, cecum, and colon. My lowest number of seqs in the colon is 11000. My lowest reasonable number of seqs in cecum is 8571, and lowest in small intestine is 6457. Out of 147 samples from the small intestine, I have about 10 samples with <5000 seqs. Initially, I subsampled all to 11,000 to keep things consistent and I was hoping to analyize all three digesta sources against each other.....but 441 samples is a pretty big data set to run and my runs get killed by the HPCC administrator.

If I subsample to 11,000 seqs in the small intestine, I lose about 15 mice from analysis. I lose 1-2 mice from the analysis for cecum and colon (there are 1-2 mice from each group with <100 seqs).

By your response, I think you understood my question?

kmitchell
Posts: 501
Joined: Wed Mar 10, 2010 8:19 pm
Contact:

Re: Subsampling 3 different groups at same level

Post by kmitchell » Tue Apr 24, 2018 11:11 am

I'd use the same number for all of them so you can compare easier. I'd also go below the lowest number you want to include because subsampling 11000 to 8700 is going to show more variability than subsampling 8700 to 8700.

What is your hpc admin complaining about? processors? ram? wall time? Once you have OTUs, subsampling even for beta diversity shouldn't take that much power/memory/time

walker92
Posts: 11
Joined: Wed Dec 28, 2016 8:04 am

Re: Subsampling 3 different groups at same level

Post by walker92 » Tue Apr 24, 2018 3:46 pm

Thanks kmitchell,

For the whole lot of 447 samples my wall time expires. When I run by digesta source, I usally can get a run completed in <4 days. I'll have to try and subsample as you suggested.

kmitchell
Posts: 501
Joined: Wed Mar 10, 2010 8:19 pm
Contact:

Re: Subsampling 3 different groups at same level

Post by kmitchell » Thu May 31, 2018 12:41 pm

It's taking 4 days to subsample? I'd be surprised with even 2 days for hundreds of mouse samples (I run on 32 cores). Are you following the SOP including the unique.seqs and pre.cluster steps? What is this data? MiSeq 2x250 or somthing else?

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest