We will be offering an R workshop December 18-20, 2019. Learn more.
The effects of eutrophication on the structure of bacterial biofilms
- 1 Introduction:
- 2 Preparing the Data for Mothur:
- 3 OTU – Based Approaches:
- 3.1 Reading Distance Matrices:
- 3.2 Clustering Sequences to Determine the Number of OTUs:
- 3.3 Reading OTU Data:
- 3.4 Single Sample Analyses – Rarefaction Curves:
- 3.5 Single Sample Analyses – Community Richness:
- 3.6 Multiple Sample Analyses/Comparison – Community Richness:
- 3.7 Multiple Sample Analyses/Comparison – Visualizing Community Differences:
- 4 Hypothesis Testing Approaches:
- 5 Conclusions:
Biofilms are the dominate form of microbial life in aquatic ecosystems and are responsible for performing a wide variety of ecosystem services including nutrient and organic matter processing and retention. Understanding how eutrophication impacts these communities is essential for ecosystem managers as many aquatic ecosystems are being enriched by anthropogenic activities. This study investigated the effects of eutrophication on biofilm community structure and diversity.
Heterotrophic microbial biofilms were grown in the dark in fifteen experimental stream channel mesocosms. The channels were lined with removable unglazed ceramic tiles, supplied with river water to ensure colonization by natural bacterial populations, and continuously enriched with a stochiometrically balanced solution of dissolved organic carbon (DOC), nitrate and phosphate to create a productivity gradient. The enrichment levels were based on the background concentration of DOC in the stream water and included no enrichment in control channels, two (2X), four (4X), eight (8X) and ten times (10X) the background DOC concentration (1.5 mg l-1 DOC). Three replicate channels were used for each level of enrichment. After three weeks of growth, the time necessary to produce a mature biofilm, samples were collected to assess the impacts of the enrichment on biofilm community structure and diversity through the extraction, amplification, sequencing, and analysis of the bacterial 16S rRNA gene. The files used to conduct this analysis can be downloaded here Media:BiofilmEu.zip.
Preparing the Data for Mothur:
Initial Data Preparation: Preprocessing and Alignment
Raw sequences were imported into CodonCode where the sequences were reviewed for quality and low quality sequences were removed from the database. 2096 high quality partial (forward reads) 16S rRNA gene sequences were exported as a .fasta file
Next the sequences were aligned using the Align tool on the http://greengenes.lbl.gov website. The .fasta file created in CodonCode was imported and the following settings were applied:
- Significant match requirements:
- Change minimum length to 300
- Minimum percent identity to 70
- Files you desire:
- Select first two (1. tab delimitated file summarizing alignment fate etc., and 2. sequence file of successfully aligned sequences)
- Formatting options:
- Select ‘do not remove alignment gap characters’
- Select ‘fasta’ not ‘fasta with description’ as preferred format
- Enter your email address
- Select ‘Process File’
These commands generated a .tgz file, that when decompressed, created a folder with a fasta file of all the aligned sequences and an excel file specifying which sequences would not align.
Creating a Filter and a Distance Matrix:
The aligned fasta file was imported into ARB using the following import options:
- Import - fasta_wgap.ift
- Type - DNA
- Use found names
Next, a filter was created in ARB to ensure all of the sequences were the same length (~550 bp) using the following steps:
- Under SAI → Create SAI using… → Filter by base frequency
- Under Config → Column filter ___ to ___ (used 445, 3773)
- Export filter → name
Finally, a distance matrix was created using the following steps:
- Under Sequence → Compare using distance matrix → Apply filter → Calculate full matrix
- Save matrix as *.dist file
Creating a Phylogenetic Tree:
A phylogenetic tree was created by adding the 2096 partial sequences to a tree containing ~ 9000 full length 16S rRNA gene sequences compiled by Philip Hugenholtz. This tree can be downloaded as an ARB file from the Ribosomal Database Project → Resources → Philip Hugenholtz's 16S rRNA ARB Database, at http://rdp.cme.msu.edu/misc/resources.jsp.
The aligned sequences were added to the Hugenholtz data base by importing a filtered fasta file of all of the sequences using the following steps:
- Import sequences: File → Import sequences and fields (ARB) → Mark imported sequences
- Add Sequences to tree: Tree → Add species to existing tree → ARB parsimony (quick add marked) → Filter (lane mask) → Weights (none) → Go (Hint: the Hugenholtz tree contains a sample named ISOT4(31… The parentheses in this name will cause problems in the .ntree file so it should be removed.)
The tree was then exported by:
- Tree → Tree admin
- Export → select ‘Newick tree format’, ‘save branch lengths’, and deselect ‘keep group names’ (these names will give mothur problems).
OTU – Based Approaches:
Reading Distance Matrices:
The distance matrix from all of the biofilm samples (Mt_AllSamp_12JUN09_F.dist) created in ARB was read into mothur using the following command:
mothur > read.dist(phylip= BiofilmEu_AllSamp.dist)
Clustering Sequences to Determine the Number of OTUs:
Running the read.dist command stored the distance matrix in memory, which was then used to cluster the sequences into OTUs by running the following command:
mothur > cluster()
The default (furthest neighbor, fn) clustering of the 2096 biofilm sequences produced three files: Mt_allsamp_12jun09_F.fn.rabund (rank abundance file), Mt_allsamp_12jun09_F.fn.sabund (species abundance file), and Mt_allsamp_12jun09_F.fn.list (a list file containing the sequences in each OTU). The following data are the first two columns from the .list file created for all of the biofilm data, indicating that: 1) a total of 479 OTUs were identified at the species level (> 97% identity) while 65 phyla level (> 80% identity) OTUs were found, 2) 253 sequences were identical (2096 sequences were clustered and 1843 unique sequences were found), and 3) all of the sequences had > 59% identify.
DistanceCutoff AllSamp Cont 2X 4X 8X 10X unique 1843 397 398 401 370 337 0 1401 355 302 339 289 264 0.01 812 303 174 195 162 155 0.03 479 235 110 116 101 90 0.05 318 183 78 86 66 63 0.2 65 50 25 21 12 13 0.41 1 1 1 1 1 1
Reading OTU Data:
Next, the OTU .list data for the 2096 samples created using the cluster() command was read back into mothur using the following command:
mothur > read.otu(list= BiofilmEu_AllSamp.fn.list, group= BiofilmEu_AllSamp.groups)
The group option reads a .groups file in mothur that was created using BBEdit software. This tab delimited file contains two columns, one with each of the 2096 sequence names followed by a column designating which experimental treatment this sample was collected from (control, 2X, 4X, 8X or 10X enrichment), allowing mothur to parse the sequences by treatment. Running this command generates a .fn.shared file and 5 separate list files, one for each treatment.
Single Sample Analyses – Rarefaction Curves:
Rarefaction curves were generated for each enrichment treatment to determine the richness of samples from the various treatments and whether we had sequenced an adequate number of clones to estimate the richness of the samples from each treatment. The .list files generated in the previous analysis step were individually read into mothur and were then used to generate these rarefaction curves using the following command.
mothur > read.otu(list= BiofilmEu_AllSamp.fn.Cont.list) mothur > rarefaction.single(freq=20)
The addition of the (freq=20) option directed mothur to print results for every 20 samples instead of the default 100 samples, allowing the creation of the smoother rarefaction curve figures seen below.
The rarefaction curves shown above indicate that: 1) the most diverse samples are from the control group where no nutrient or carbon enrichment was applied, 2) samples from the enrichments have similar levels of diversity, 3) the enrichment curves have begun to level off, indicating near complete sampling of diversity, and 4) the control curve is still increasing significantly implying more sampling is necessary to completely sample the species level diversity in these samples.
Single Sample Analyses – Community Richness:
The following commands were used to generate summary files containing various richness predictors including the Chao1 estimates reported below.
mothur > read.otu(list=BiofilmEu_AllSamp.fn.cont.list) mothur > summary.single()
The first command reads the previously created .list file into mothur, while the second command generates the richness summary files. These commands were run a total of five times, once for each treatment. The results of the richness predictors seen below indicate that the species level (> 97% identity) richness of 582 OTUs for the control channels was ~2-3 times that of the treatment channels which all had Chao1 richness estimates of ~ 200 OTUs.
Cont 2x 4x 8x 10x Label Chao Chao_Range Chao Chao_Range Chao Chao_Range Chao Chao_Range Chao Chao_Range 0.03 582 457-778 211 161-301 176 146-235 203 150-314 192 138-307 0.05 426 326-598 111 93-153 125 103-177 113 84-191 108 81-174
Multiple Sample Analyses/Comparison – Community Richness:
The following commands were used to estimate the overlap of OTUs between the bacterial communities from the various treatment channels:
mothur > read.otu(shared=BiofilmEu_AllSamp.fn.shared) mothur > summary.shared()
The first command reads the previously created ‘shared’ file (created from the ‘mothur > read.otu(list= Mt_allsamp_12jun09_F.fn.list, group= Environ_ByTreatment.groups)’ command) into mothur, while the second command generates a summary file exploring the OTU richness shared between samples.
Comparison Shared OTUs Cont - 2X 27 Cont - 4X 24 Cont - 8X 8 Cont - 10X 5 2X - 4X 34 2X - 8X 16 2X - 10X 9 4X - 8X 44 4X - 10X 33 8X - 10X 43
In each case, as the enrichment level increased the shared OTUs between samples decreased, indicating increasing enrichment levels created divergent bacterial communities.
Multiple Sample Analyses/Comparison – Visualizing Community Differences:
A Venn Diagram was constructed to show overlap between the control, 4X, and 10X treatments using the following commands:
mothur > read.otu(shared=BiofilmEu_Venn.fn.shared) mothur > summary.shared() mothur > venn(label=0.03, calc=sharedsobs-sharedchao)
This mt_venndiag…file was created from a distance matrix that only included data from the three treatments listed above. The first command reads the shared file into mothur while the second command generates the Venn Diagrams seen seen to the right for shared Chao1 and OTUs richness at the 97% identity level.
Additionally, two heatmaps were generated for further visualization of the OTU data from the various communities using the following commands:
mothur > read.otu(list=BiofilmEu_AllSamp.fn.list, group=BiofilmEu_AllSamp.groups ) mothur > heatmap() mothur > heatmap(scale=linear)
The first command reads the list file for all of the sequence data into mothur, the second command generates heatmaps with log10 scaling, while the last command generates heatmaps with the scaling option turned off. See the figure below for the log10 version at a distance of 0.03.
- The heatmap clearly shows the control treatment bacterial community was the most dissimilar
- Additionally, the 2X community appears to be distinct when compared to the 4X, 8X and 10X communities
Hypothesis Testing Approaches:
A variety of analyses are available in mothur to test whether different communities have the same structure. I applied the parsimony and UniFrac methods to the biofilm data to determine whether eutrophication led to distinct bacterial communities.
The parsimony test which has been implemented previously in both the UniFrac and TreeClimber programs assesses whether two or more communities have the same structure by determining differences in the phylogenetic arrangement of sequences from the communities. The following commands were executed to read the phylogenetic tree created above into mothur and to conduct a global significance test to determine whether any of the eutrophication treatment communities had a different structure from the rest of the treatments.
mothur > read.tree(tree=BiofilmEu_AllSamp.ntree) mothur > parsimony()
The abbreviated global results below indicate that at least one of the eutrophication treatment groups is significantly different from the others.
Groups ParsSig Cont-2X-4X-8X-10X <0.0010
These results were further explored using the (group=all) command which provides the following p-values for all of the pairwise comparisons:
Groups ParsScore ParsSig Cont-2X 82 <0.0010 Cont-4X 62 <0.0010 Cont-8X 35 <0.0010 Cont-10X 25 <0.0010 2X-4X 142 <0.0010 2X-8X 51 <0.0010 2X-10X 37 <0.0010 4X-8X 153 <0.0010 4X-10X 117 <0.0010 8X-10X 228 <0.0010
Each of these comparisons has a p-value of less than 0.001, however, these values must be corrected for multiple comparisons by comparing them to 0.05/10 = 0.005 (10 is the total number of comparisons between 5 groups). In this case each of the p-values is lower than 0.005 indicating each of the groups are significantly different from one another.
Unweighted UniFrac Analysis:
The unweighted UniFrac metric can also be used to assess whether microbial communities are significantly different. The simplified logic behind UniFrac is as follows: all samples from all communities of interest are used to create a phylogenetic tree, the fraction of the branch length that leads to sequences from a single community is used to determine similarities/differences between communities, and a random permutation test is used to assess the probability that the distance between communities is a product of chance. The following commands are similar to those performed with the parsimony analysis. They read the phylogenetic tree created above into mothur and conduct a global significance test to determine whether any of the eutrophication treatment communities had a different structure from the rest of the treatments.
mothur > read.tree(tree= BiofilmEu_AllSamp.ntree) mothur > unifrac.unweighted()
This produces the global result seen below, indicating that at least one of the eutrophication communities is significantly different from the others.
Groups UWScore UWSig Cont-2X-4X-8X-10X 0.818215 <0.0010
These results can be further explored using the following code:
mothur > unifrac.unweighted(group=all)
Groups UWScore UWSig Cont-2X 0.875703 <0.0010 Cont-4X 0.894106 <0.0010 Cont-8X 0.938222 <0.0010 Cont-10X 0.944787 <0.0010 2X-4X 0.836707 <0.0010 2X-8X 0.911103 <0.0010 2X-10X 0.917776 <0.0010 4X-8X 0.832491 <0.0010 4X-10X 0.851481 <0.0010 8X-10X 0.769891 <0.0010
As with the parsimony analysis these results need to be corrected for multiple comparisons by comparing them to 0.05/10 = 0.005 (10 is the total number of comparisons between 5 groups). In this case each of the p-values is lower than 0.005 indicating each of the groups are significantly different from one another.
Analysis of 16S rRNA gene data collected from bacterial biofilm communities that were grown under varying levels of eutrophication indicates eutrophication significantly impacted community diversity and composition. OTU diversity in the control treatment was approximately twice that found in the enrichment treatments. Additionally, each of the control, 2X, 4X, 8X, and 10X treatments had a distinct bacterial community as seen visually using Venn diagrams and heatmaps, and statistically using parsimony and UniFrac analysis.