1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661
|
.. _tutorial:
==========================
QIIME Overview Tutorial
==========================
Introduction
-------------
This tutorial explains how to use the **QIIME** (Quantitative Insights Into Microbial Ecology) Pipeline to process data from high-throughput 16S rRNA sequencing studies. If you have not already installed qiime, please see the section `Installing Qiime <../install/index.html>`_ first. The purpose of this pipeline is to provide a start-to-finish workflow, beginning with multiplexed sequence reads and finishing with taxonomic and phylogenetic profiles and comparisons of the samples in the study. With this information in hand, it is possible to determine biological and environmental factors that alter microbial community ecology in your experiment.
As an example, we will use data from a study of the response of mouse gut microbial communities to fasting (Crawford et al., 2009). To make this tutorial run quickly on a personal computer, we will use a subset of the data generated from 5 animals kept on the control ad libitum fed diet, and 4 animals fasted for 24 hours before sacrifice. At the end of our tutorial, we will be able to compare the community structure of control vs. fasted animals. In particular, we will be able to compare taxonomic profiles for each sample type, differences in diversity metrics within the samples and between the groups, and perform comparative clustering analysis to look for overall differences in the samples.
In this walkthrough, text like the following: ::
print_qiime_config.py
denotes the command-line invocation of scripts. You can find full usage information for each script by passing the -h option (help) and/or by reading the full description in the `Documentation <../documentation/index.html>`_. Execute all tutorial commands from within the :file:`qiime_tutorial` directory, which can be downloaded from here: `QIIME Tutorial files <http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.4.0.zip>`_.
To process our data, we will perform the following analyses, each of which is described in more detail below:
* Filter the DNA sequence reads for quality and assign multiplexed reads to starting samples by nucleotide barcode .
* Pick Operational Taxonomic Units (OTUs) based on sequence similarity within the reads, and pick a representative sequence from each OTU.
* Assign the OTU to a taxonomic identity using reference databases.
* Align the OTU sequences and create a phylogenetic tree.
* Calculate diversity metrics for each sample and compare the types of communities, using the taxonomic and phylogenetic assignments.
* Generate UPGMA and PCoA plots to visually depict the differences between the samples, and dynamically work with these graphs to generate publication quality figures.
Essential Files
----------------
All the files you will need for this tutorial are here (http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.4.0.zip). Descriptions of these files are below.
Sequences (.fna)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the 454-machine generated FASTA file. Using the Amplicon processing software on the 454 FLX standard, each region of the PTP plate will yield a fasta file of form :file:`1.TCA.454Reads.fna`, where "1" is replaced with the appropriate region number. For the purposes of this tutorial, we will use the fasta file :file:`Fasting_Example.fna`.
Quality Scores (.qual)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the 454-machine generated quality score file, which contains a score for each base in each sequence included in the FASTA file. Like the fasta file mentioned above, the Amplicon processing software will generate one of these files for each region of the PTP plate, named :file:`1.TCA.454Reads.qual`, etc. For the purposes of this tutorial, we will use the quality scores file :file:`Fasting_Example.qual`.
Mapping File (Tab-delimited .txt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The mapping file is generated by the user. This file contains all of the information about the samples necessary to perform the data analysis. At a minimum, the mapping file should contain the name of each sample, the barcode sequence used for each sample, the linker/primer sequence used to amplify the sample, and a Description column. In general, you should also include in the mapping file any metadata that relates to the samples (for instance, health status or sampling site) and any additional information relating to specific samples that may be useful to have at hand when considering outliers (for example, what medications a patient was taking at time of sampling). Of note: the sample names may only contain alphanumeric characters (A-z) and the dot (.). Full format specifications can be found in the Documentation (`File Formats <../documentation/file_formats.html>`_).
For the purposes of this tutorial, we will use the mapping file :file:`Fasting_Map.txt`. The contents of the mapping file are shown here - as you can see, a nucleotide barcode sequence is provided for each of the 9 samples, as well as metadata related to treatment group and date of birth, and general run descriptions about the project. Fasting_Map.txt file contents:
.. note::
* #SampleID BarcodeSequence LinkerPrimerSequence Treatment DOB Description
* #Example mapping file for the QIIME analysis package. These 9 samples are from a study of the effects of
* #exercise and diet on mouse cardiac physiology (Crawford, et al, PNAS, 2009).
* PC.354 AGCACGAGCCTA YATGCTGCCTCCCGTAGGAGT Control 20061218 Control_mouse__I.D._354
* PC.355 AACTCGTCGATG YATGCTGCCTCCCGTAGGAGT Control 20061218 Control_mouse__I.D._355
* PC.356 ACAGACCACTCA YATGCTGCCTCCCGTAGGAGT Control 20061126 Control_mouse__I.D._356
* PC.481 ACCAGCGACTAG YATGCTGCCTCCCGTAGGAGT Control 20070314 Control_mouse__I.D._481
* PC.593 AGCAGCACTTGT YATGCTGCCTCCCGTAGGAGT Control 20071210 Control_mouse__I.D._593
* PC.607 AACTGTGCGTAC YATGCTGCCTCCCGTAGGAGT Fast 20071112 Fasting_mouse__I.D._607
* PC.634 ACAGAGTCGGCT YATGCTGCCTCCCGTAGGAGT Fast 20080116 Fasting_mouse__I.D._634
* PC.635 ACCGCAGAGTCA YATGCTGCCTCCCGTAGGAGT Fast 20080116 Fasting_mouse__I.D._635
* PC.636 ACGGTGAGTGTC YATGCTGCCTCCCGTAGGAGT Fast 20080116 Fasting_mouse__I.D._636
.. _checkmapping:
Check Mapping File
--------------------------------------------------------------------
Before beginning with QIIME, you should ensure that your mapping file is formatted correctly with the `check_id_map.py <../scripts/check_id_map.html>`_ script. Type: ::
check_id_map.py -m Fasting_Map.txt -o mapping_output -v
If verbose (-v) is enabled, this utility will display a message indicating whether or not problems were found in the mapping file. Errors and warnings will the output to a log file, which will be present in the specified (-o) output directory. Errors will cause fatal problems with subsequent scripts and must be corrected before moving forward. Warnings will not cause fatal problems, but it is encouraged that you fix these problems as they are often indicative of typos in your mapping file, invalid characters, or other unintended errors that will impact downstream analysis. A :file:`corrected_mapping.txt` file will also be created in the output directory, which will have a copy of the mapping file with invalid characters replaced by underscores, or a message indicating that no invalid characters were found.
.. _assignsamples:
Assign Samples to Multiplex Reads
--------------------------------------------------------------------
The next task is to assign the multiplexed reads to samples based on their nucleotide barcode. Also, this step performs quality filtering based on the characteristics of each sequence, removing any low quality or ambiguous reads. The script for this step is `split_libraries.py <../scripts/split_libraries.html>`_. A full description of parameters for this script are described in the `Documentation <../documentation/index.html>`_. For this tutorial, we will use default parameters (minimum quality score = 25, minimum/maximum length = 200/1000, no ambiguous bases allowed and no mismatches allowed in the primer sequence). Type: ::
split_libraries.py -m Fasting_Map.txt -f Fasting_Example.fna -q Fasting_Example.qual -o split_library_output
This invocation will create three files in the new directory :file:`split_library_output/`:
* :file:`split_library_log.txt` : This file contains the summary of splitting, including the number of reads detected for each sample and a brief summary of any reads that were removed due to quality considerations.
* :file:`histograms.txt` : This tab delimited file shows the number of reads at regular size intervals before and after splitting the library.
* :file:`seqs.fna` : This is a fasta formatted file where each sequence is renamed according to the sample it came from. The header line also contains the name of the read in the input fasta file and information on any barcode errors that were corrected.
A few lines from the :file:`seqs.fna` file are shown below:
.. note::
* >PC.634_1 FLP3FBN01ELBSX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT bc_diffs=0
* CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCC....
* >PC.634_2 FLP3FBN01EG8AX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT bc_diffs=0
* TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTT....
* >PC.354_3 FLP3FBN01EEWKD orig_bc=AGCACGAGCCTA new_bc=AGCACGAGCCTA bc_diffs=0
* TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGGCTATGCATCATTGCCTT....
* >PC.481_4 FLP3FBN01DEHK3 orig_bc=ACCAGCGACTAG new_bc=ACCAGCGACTAG bc_diffs=0
* CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGACT....
.. _pickotusandrepseqs:
Picking Operational Taxonomic Units (OTUs) through making OTU table
--------------------------------------------------------------------
Here we will be running the `pick_otus_through_otu_table.py <../scripts/pick_otus_through_otu_table.html>`_ workflow, which performs a series of small steps by calling a series of other scripts automatically. This workflow consists of the following steps:
1. Picking OTUs (for more information, refer to `pick_otus.py <../scripts/pick_otus.html>`_)
2. Picking a representative sequence set, one sequence from each OTU (for more information, refer to `pick_rep_set.py <../scripts/pick_rep_set.html>`_)
3. Aligning the representative sequence set (for more information, refer to `align_seqs.py <../scripts/align_seqs.html>`_)
4. Assigning taxonomy to the representative sequence set (for more information, refer to `assign_taxonomy.py <../scripts/assign_taxonomy.html>`_)
5. Filtering the alignment prior to tree building - removing positions which are all gaps, or not useful for phylogenetic inference (for more information, refer to `filter_alignment.py <../scripts/filter_alignment.html>`_)
6. Building a phylogenetic tree (for more information, refer to `make_phylogeny.py <../scripts/make_phylogeny.html>`_)
7. Building an OTU table (for more information, refer to `make_otu_table.py <../scripts/make_otu_table.html>`_)
Using the output from split_libraries.py (the seqs.fna file), run the following command: ::
pick_otus_through_otu_table.py -i split_library_output/seqs.fna -o otus
Optionally, we could denoise the sequences based on clustering the flowgram sequences. For a single library/sff file we can simply use the workflow script `pick_otus_through_otu_tables.py <../scripts/pick_otus_through_otu_table.html>`_, by providing the script with the sff file and the metadata mapping file. For multiple sff files refer to the special purpose tutorial `Denoising of 454 Data Sets <denoising_454_data.html>`_.
The results of `pick_otus_through_otu_table.py` are in :file:`otus/`, and a description of the steps performed and the results follow:
.. _pickotusseqsim:
Step 1. Pick OTUs based on Sequence Similarity within the Reads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
At this step, all of the sequences from all of the samples will be clustered into Operational Taxonomic Units (OTUs) based on their sequence similarity. OTUs in QIIME are clusters of sequences, frequently intended to represent some degree of taxonomic relatedness. For example, when sequences are clustered at 97% sequence similarity with uclust, each resulting cluster is typically thought of as representing a species. This model and the current techniques for picking OTUs are known to be flawed, however, in that 97% OTUs do not match what humans have called species for many microbes. Determining exactly how OTUs should be defined, and what they represent, is an active area of research.
`pick_otus_through_otu_table.py` assigns sequences to OTUs at 97% similarity by default. Further information on how to view and change default behavior will be discussed later.
.. _pickrepseqsforotu:
Step 2. Pick Representative Sequences for each OTU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since each OTU may be made up of many related sequences, we will pick a representative sequence from each OTU for downstream analysis. This representative sequence will be used for taxonomic identification of the OTU and phylogenetic alignment. QIIME uses the OTU file created above and extracts a representative sequence from the fasta file by one of several methods.
In the :file:`otus/rep_set/` directory, QIIME has created two new files - the log file :file:`seqs_rep_set.log` and the fasta file :file:`seqs_rep_set.fasta` containing one representative sequence for each OTU. In this fasta file, the sequence has been renamed by the OTU, and the additional information on the header line reflects the sequence used as the representative:
.. note::
* >0 PC.636_424
* CTGGGCCGTATCTCAGTCCCAATGTGGCCGGTCGACCTCTC....
* >1 PC.481_321
* TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTC....
.. _assigntax:
Step 3. Assign Taxonomy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A primary goal of the QIIME pipeline is to assign high-throughput sequencing reads to taxonomic identities using established databases. This provides information on the microbial lineages found in microbial samples. By default, QIIME uses the RDP classifier to assign taxonomic data to each representative sequence from step 2, above.
In the directory :file:`otus/rdp_assigned_taxonomy/`, there will be a log file and a text file. The text file contains a line for each OTU considered, with the RDP taxonomy assignment and a numerical confidence of that assignment (1 is the highest possible confidence). For some OTUs, the assignment will be as specific as a bacterial species, while others may be assignable to nothing more specific than the bacterial domain. Below are the first few lines of the text file and the user should note that the taxonomic assignment and confidence numbers from their run may not coincide with the output shown below, due to the RDP classification algorithm:
.. note::
* 41 PC.356_347 Root;Bacteria 0.980
* 63 PC.635_130 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae" 0.960
* 353 PC.634_150 Root;Bacteria;Proteobacteria;Deltaproteobacteria 0.880
* 18 PC.355_1011 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Rikenellaceae;Alistipes 0.990
.. _alignotuseq:
Step 4. Align OTU Sequences
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Alignment of the sequences and phylogeny inference is necessary only if phylogenetic tools such as UniFrac_ will be subsequently invoked. Alignments can either be generated de novo using programs such as MUSCLE, or through assignment to an existing alignment with tools like PyNAST_. For small studies such as this tutorial, either method is possible. However, for studies involving many sequences (roughly, more than 1000), the de novo aligners are very slow and assignment with PyNAST_ is preferred. Since this is one of the most computationally intensive bottlenecks in the pipeline, large studies benefit greatly from parallelization of this task (described in detail in the `Documentation <../documentation/index.html>`_): When using PyNAST_ as an aligner (the default), QIIME must know the location of a template alignment. Most QIIME installations use the greengenes file 'core_set_aligned.fasta.imputed' by default.
After aligning the sequences, a log file and an alignment file are created in the directory :file:`otus/pynast_aligned_seqs/`.
.. _filteraln:
Step 5. Filter Alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before inferring a phylogenetic tree relating the sequences, it is beneficial to filter the sequence alignment to removed columns comprised of only gaps, and locations known to be excessively variable. Most QIIME installations use a lanemask file named either lanemask_in_1s_and_0s.txt or lanemask_in_1s_and_0s by default. After filtering, a filtered alignment file is created in the directory :file:`otus/pynast_aligned_seqs/`.
.. _maketree:
Step 6. Make Phylogenetic Tree
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The filtered alignment file produced in the directory :file:`otus/pynast_aligned_seqs/` is then used to build a phylogenetic tree using a tree-building program.
The Newick format tree file is written to :file:`rep_set.tre`, which is located in the :file:`otus/` directory . This file can be viewed in a tree visualization software, and is necessary for UniFrac_ diversity measurements and other phylogenetically aware analyses (described below). The tree obtained can be visualized with programs such as FigTree, which was used to visualize the phylogenetic tree obtained from :file:`rep_set.tre`.
.. image:: ../images/ tree.png
:align: center
.. _makeotutable:
Step 7. Make OTU Table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using taxonomic assignments (step 3) and the OTU map (step 1) QIIME assembles a readable matrix of OTU abundance in each sample with meaningful taxonomic identifiers for each OTU.
The result of this step is :file:`otu_table.txt`, which is located in the :file:`otus/` directory. The first few lines of :file:`otu_table.txt` are shown below (OTUs 1-9), where the first column contains the OTU number, the last column contains the taxonomic assignment for the OTU, and 9 columns between are for each of our 9 samples. The value of each *i,j* entry in the matrix is the number of times OTU *i* was found in the sequences for sample *j*.
.. note ::
| #Full OTU Counts
| #OTU ID PC.354 PC.355 PC.356 PC.481 PC.593 PC.607 PC.634 PC.635 PC.636 Consensus Lineage
| 0 0 0 0 0 0 0 0 1 0 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
| 1 0 0 0 0 0 1 0 0 0 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
| 2 0 0 0 0 0 0 0 0 1 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Porphyromonadaceae;Parabacteroides
| 3 2 1 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae";"Lachnospiraceae Incertae Sedis"
| 4 1 0 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
| 5 0 0 0 0 0 0 0 0 1 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales
| 6 0 0 0 0 0 0 0 1 0 Root;Bacteria;Actinobacteria;Actinobacteria
| 7 0 0 2 0 0 0 0 0 1 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Ruminococcaceae"
| 8 1 1 0 2 4 0 0 0 0 Root;Bacteria;Firmicutes;"Bacilli";"Lactobacillales";Lactobacillaceae;Lactobacillus
| 9 0 0 2 0 0 0 0 0 0 Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
.. _perlibrarystats:
View statistics of the OTU table
--------------------------------------------------------------------
To view the number of sequence reads which were assigned to the otu table (otus/otu_table.txt), type::
per_library_stats.py -i otus/otu_table.txt
The output shows that there are relatively few sequences in this tutorial example, but the sequences present are fairly evenly distributed among the 9 microbial communities.
.. note ::
| Num samples: 9
|
| Seqs/sample summary:
| Min: 146
| Max: 150
| Median: 148.0
| Mean: 148.111111111
| Std. dev.: 1.4487116456
| Median Absolute Deviation: 1.0
| Default even sampling depth in
| core_qiime_analyses.py (just a suggestion): 146
|
| Seqs/sample detail:
| PC.355: 146
| PC.481: 146
| PC.636: 147
| PC.354: 148
| PC.635: 148
| PC.593: 149
| PC.607: 149
| PC.356: 150
| PC.634: 150
.. _makeheatmap:
Make OTU Heatmap
--------------------------------------------------------------------
The QIIME pipeline includes a very useful utility to generate images of the OTU table. The script is `make_otu_heatmap_html.py <../scripts/make_otu_heatmap_html.html>`_. Type::
make_otu_heatmap_html.py -i otus/otu_table.txt -o otus/OTU_Heatmap/
An html file is created in the directory :file:`otus/OTU_Heatmap/`. You can open this file with any web browser, and will be prompted to enter a value for "Filter by Counts per OTU". Only OTUs with total counts at or above this threshold will be displayed. The OTU heatmap displays raw OTU counts per sample, where the counts are colored based on the contribution of each OTU to the total OTU count present in that sample (blue: contributes low percentage of OTUs to sample; red: contributes high percentage of OTUs). Leave the filter value unchanged, and click the "Sample ID" button, and a graphic will be generated like the figure below. For each sample, you will see in a heatmap the number of times each OTU was found in that sample. You can mouse over any individual count to get more information on the OTU (including taxonomic assignment). Within the mouseover, there is a link for the terminal lineage assignment, so you can easily search Google for more information about that assignment.
.. image:: ../images/ heatmap.png
:align: center
Alternatively, you can click on one of the counts in the heatmap and a new pop-up window will appear. The pop-up window uses a Google Visualization API called Magic-Table. Depending on which table count you clicked on, the pop-up window will put the clicked-on count in the middle of the pop-up heatmap as shown below. For the following example, the table count with the red arrow mouseover is the same one being focused on using the Magic-Table.
.. image:: ../images/ fisheyeheatmap.png
:align: center
On the original heatmap webpage, select the "Taxonomy" button instead: you will generate a heatmap keyed by taxon assignment, which allows you to conveniently look for organisms and lineages of interest in your study. Again, mousing over an individual count will show additional information for that OTU and sample.
.. image:: ../images/ taxheatmap.png
:align: center
.. _makeotunetwork:
Make OTU Network
----------------------------------------------
An alternative to viewing the OTU table as a heatmap is to create an OTU network, using the following command.::
make_otu_network.py -m Fasting_Map.txt -i otus/otu_table.txt -o otus/OTU_Network
To visualize the network, we use the Cytoscape_ program (which you can run by calling cytoscape from the command line -- you may need to call this beginning either with a capital or lowercase 'C' depending on your version of Cytoscape), where each red circle represents a sample and each white square represents an OTU. The lines represent the OTUs present in a particular sample (blue for controls and green for fasting). For more information about opening the files in Cytoscape_ please refer to the `Cytoscape Usage <../scripts/cytoscape_usage.html>`_.
.. image:: ../images/ network.png
:align: center
.. _summarizetaxa:
Summarize Communities by Taxonomic Composition
----------------------------------------------------------------------------
You can group OTUs by samples or categories (when "-c" option is passed) by different taxonomic levels (division, class, family, etc.) with the workflow script `summarize_taxa_through_plots.py <../scripts/summarize_taxa_through_plots.html>`_. Note that this process depends directly on the method used to assign taxonomic information to OTUS (see `Assigning Taxonomy`__ above). Type:
__ assigntax_
::
summarize_taxa_through_plots.py -i otus/otu_table.txt -o wf_taxa_summary -m Fasting_Map.txt
The script will generate a new table grouping sequences by taxonomic assignment at various levels, for example the phylum level table at: :file:`wf_taxa_summary/otu_table_L3.txt`. The value of each *i,j* entry in the matrix is the count of the number of times all OTUs belonging to the taxon *i* (for example, Phylum Actinobacteria) were found in the sequences for sample *j*.
.. note::
| #Full OTU Counts
| Taxon PC.354 PC.355 PC.356 PC.481 PC.593 PC.607 PC.634 PC.635 PC.636
| Root;Bacteria;Actinobacteria 0.0 0.0 0.0 1.0 0.0 2.0 3.0 1.0 1.0
| Root;Bacteria;Bacteroidetes 7.0 38.0 15.0 19.0 30.0 40.0 86.0 54.0 90.0
| Root;Bacteria;Deferribacteres 0.0 0.0 0.0 0.0 0.0 3.0 5.0 2.0 7.0
| Root;Bacteria;Firmicutes 136.0 102.0 115.0 117.0 65.0 66.0 37.0 63.0 34.0
| Root;Bacteria;Other 5.0 6.0 18.0 9.0 49.0 35.0 14.0 27.0 14.0
| Root;Bacteria;Proteobacteria 0.0 0.0 0.0 0.0 5.0 3.0 2.0 0.0 1.0
| Root;Bacteria;TM7 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
| Root;Bacteria;Verrucomicrobia 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
| Root;Other 0.0 0.0 2.0 0.0 0.0 0.0 0.0 1.0 0.0
.. _maketaxacharts:
To view the resulting charts, open the area or bar chart html file located in the :file:`wf_taxa_summary/taxa_summary_plots` folder. The following chart shows the taxa assignments for each sample as an area chart. You can mouseover the plot to see which taxa are contributing to the percentage shown.
.. image:: ../images/areachart1.png
:align: center
The following chart shows the taxa assignments for each sample as a bar chart.
.. image:: ../images/barchart1.png
:align: center
.. _compalphadivrarecurves:
Compute Alpha Diversity within the Samples and Generate Rarefaction Curves
---------------------------------------------------------------------------
Community ecologists typically describe the microbial diversity within their study. This diversity can be assessed within a sample (alpha diversity) or between a collection of samples (beta diversity). Here, we will determine the level of alpha diversity in our samples using a series of scripts from the QIIME pipeline. To perform this analysis, we will use the :file:`alpha_rarefaction.py` workflow script. This script performs the following steps:
1. Generate rarefied OTU tables (for more information, refer to `multiple_rarefactions.py <../scripts/multiple_rarefactions.html>`_)
2. Compute measures of alpha diversity for each rarefied OTU table (for more information, refer to `alpha_diversity.py <../scripts/alpha_diversity.html>`_)
3. Collate alpha diversity results (for more information, refer to `collate_alpha.py <../scripts/collate_alpha.html>`_)
4. Generate alpha rarefaction plots (for more information, refer to `make_rarefaction_plots.py <../scripts/make_rarefaction_plots.html>`_)
Although we could run this workflow with the (sensible) default parameters, this provides an excellent opportunity to illustrate the use of custom parameters. To see what measures of alpha diversity will be computed by default, type: ::
alpha_diversity.py -h
You should see, among other information:
.. note ::
| -m METRICS, --metrics=METRICS
| Alpha-diversity metric(s) to use. A comma-separated
| list should be provided when multiple metrics are
| specified. [default:
| PD_whole_tree,chao1,observed_species]
to also use the shannon index, create a custom parameters file by typing: ::
echo "alpha_diversity:metrics shannon,PD_whole_tree,chao1,observed_species" > alpha_params.txt
Then run the workflow, which requires the OTU table (-i) and phylogenetic tree (-t) from `above`__, and the custom parameters file we just created:
__ pickotusandrepseqs_
::
alpha_rarefaction.py -i otus/otu_table.txt -m Fasting_Map.txt -o wf_arare/ -p alpha_params.txt -t otus/rep_set.tre
Descriptions of the steps involved in alpha_rarefaction.py follow:
.. _rareotutable:
Step 1. Rarify OTU Table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The directory :file:`wf_arare/rarefaction/` will contain many text files named :file:`rarefaction_##_#.txt`; the first set of numbers represents the number of sequences sampled, and the last number represents the iteration number. If you opened one of these files, you would find an OTU table where for each sample the sum of the counts equals the number of samples taken.
.. _computealphadiv:
Step 2. Compute Alpha Diversity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The rarefaction tables are the basis for calculating diversity metrics, which reflect the diversity within the sample based on the abundance of various taxa within a community. The QIIME pipeline allows users to conveniently calculate more than two dozen different diversity metrics. The full list of available metrics is available here: `alpha-diversity metrics <../scripts/alpha_diversity_metrics.html>`_. Every metric has different strengths and limitations - technical discussion of each metric is readily available online and in ecology textbooks, but it is beyond the scope of this document. By default, QIIME calculates three metrics:
#. Chao1 metric estimates the species richness.
#. The Observed Species metric is simply the count of unique OTUs found in the sample.
#. Phylogenetic Distance (PD_whole_tree) is the only phylogenetic metric used, and requires a phylogenetic tree.
In addition, :file:`alpha_params.txt` specified above adds the shannon index to the list of alpha diversity measures calculated by QIIME.
The result of this step produces several text files with the results of the alpha diversity computations performed on the rarefied OTU tables. The results are located in the :file:`wf_arare/alpha_div/` directory.
.. _collateotutable:
Step 3. Collate Rarified OTU Tables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The output directory :file:`wf_arare/alpha_div/` will contain one text file :file:`alpha_rarefaction_##_#` for every file input from :file:`wf_arare/rarefaction/`, where the numbers represent the number of samples and iterations as before. The content of this tab delimited file is the calculated metrics for each sample. To collapse the individual files into a single combined table, the workflow uses the script `collate_alpha.py <../scripts/collate_alpha.html>`_.
In the newly created directory :file:`wf_arare/alpha_div_collated/`, there will be one matrix for every alpha diversity metric used. This matrix will contain the metric for every sample, arranged in ascending order from lowest number of sequences per sample to highest. A portion of the :file:`observed_species.txt` file are shown below:
.. note::
* Sequences per sample iteration PC.354 PC.355 PC.356 PC.481 PC.593
* alpha_rarefaction_21_0.txt 21 0 14.0 16.0 18.0 18.0 13.0
* alpha_rarefaction_21_1.txt 21 1 15.0 17.0 18.0 20.0 12.0
* alpha_rarefaction_21_2.txt 21 2 15.0 16.0 21.0 19.0 13.0
* alpha_rarefaction_21_3.txt 21 3 10.0 19.0 18.0 21.0 13.0
* alpha_rarefaction_21_4.txt 21 4 14.0 18.0 16.0 15.0 12.0
* ...
.. _generaterarecurves:
Step 4. Generate Rarefaction Curves
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
QIIME creates plots of alpha diversity vs. simulated sequencing effort, known as rarefaction plots, using the script `make_rarefaction_plots.py <../scripts/make_rarefaction_plots.html>`_. This script takes a mapping file and any number of rarefaction files generated by `collate_alpha.py <../scripts/collate_alpha.html>`_ and creates rarefaction curves. Each curve represents a sample and can be colored by the sample metadata supplied in the mapping file.
This step generates a :file:`wf_arare/alpha_rarefaction_plots/rarefaction_plots.html` that can be opened with a web browser, in addition to other files. The :file:`wf_arare/alpha_rarefaction_plots/average_tables/` folder, which contains the rarefaction averages for each diversity metric, so the user can optionally plot the rarefaction curves in another application, like MS Excel. The :file:`wf_arare/alpha_rarefaction_plots/average_plots/` folder contains the average plots for each metric and category and the :file:`wf_arare/alpha_rarefaction_plots/html_plots/` folder contains all the images used in the html page generated.
Viewing Alpha Diversity Results
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To view the rarefaction plots, open the file :file:`wf_arare/alpha_rarefaction_plots/rarefaction_plots.html` in a web browser, typically by double-clicking on it. Once the browser window is open, select the metric `PD_whole_tree` and the category `Treatment`, to reveal a plot like the figure below. You can also turn on/off lines in the plot by (un)checking the box next to each label in the legend, or click on the triangle next to each label in the legend to see all the samples that contribute to that category. Below each plot is a table displaying average values for each measure of alpha diversity for each group of samples the specified category.
.. image:: ../images/ rarecurve.png
:align: center
.. _compbetadivgenpcoa:
Compute Beta Diversity and Generate Beta Diversity Plots
--------------------------------------------------------
Beta diversity represents the explicit comparison of microbial (or other) communities based on their composition. Beta-diversity metrics thus assess the differences between microbial communities. The fundamental output of these comparisons is a square matrix where a "distance" or dissimilarity is calculated between every pair of community samples, reflecting the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal Coordinate Analysis (PCoA) and hierarchical clustering. Like alpha diversity, there are many possible metrics which can be calculated with the QIIME pipeline - the full list of options can be found here `beta diversity metrics <../scripts/beta_diversity_metrics.html>`_. Here, we will calculate beta diversity between our 9 microbial communities using the default beta diversity metrics of weighted and unweighted unifrac, which are phylogenetic measures used extensively in recent microbial community sequencing projects. To perform this analysis, we will use the `beta_diversity_through_plots.py <../scripts/beta_diversity_through_plots.html>`_ workflow script. This script performs the following steps:
1. Rarify OTU table (for more information, refer to `single_rarefaction.py <../scripts/single_rarefaction.html>`_)
2. Make preferences file (for more information, refer to `make_prefs_file.py <../scripts/make_prefs_file.html>`_)
3. Compute Beta Diversity (for more information, refer to `beta_diversity.py <../scripts/beta_diversity.html>`_)
4. Generate Principal Coordinates (for more information, refer to `principal_coordinates.py <../scripts/principal_coordinates.html>`_)
5. Generate 3D PCoA plots (for more information, refer to `make_3d_plots.py <../scripts/make_3d_plots.html>`_)
6. Generate 2D PCoA plots (for more information, refer to `make_2d_plots.py <../scripts/make_2d_plots.html>`_)
7. Make Distance Histograms (for more information, refer to `make_distance_histograms.py <../scripts/make_distance_histograms.html>`_)
To run the workflow, type the following command, which defines the input OTU table "-i" and tree file "-t" (from `pick_otus_through_otu_table.py <../scripts/pick_otus_through_otu_table.html>`_), the user-defined mapping file "-m", the output directory "-o", and the number of sequences per sample (sequencing depth) as 146: ::
beta_diversity_through_plots.py -i otus/otu_table.txt -m Fasting_Map.txt -o wf_bdiv_even146/ -t otus/rep_set.tre -e 146
Descriptions of the steps involved in `beta_diversity_through_plots.py` follow:
.. _compbetadiv:
Step 1. Rarify OTU Table to Remove Sample Heterogeneity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To remove sample heterogeneity, we can perform rarefaction on our OTU table. Rarefaction is an ecological approach that allows users to standardize the data obtained from samples with different sequencing efforts, and to compare the OTU richness of the samples using this standardized platform. For instance, if one of your samples yielded 10,000 sequence counts, and another yielded only 1,000 counts, the species diversity within those samples may be much more influenced by sequencing effort than underlying biology. The approach of rarefaction is to randomly sample the same number of OTUs from each sample, and use this data to compare the communities at a given level of sampling effort.
The 9 communities in the tutorial data contain the following numbers of sequences per sample (see perlibrarystats_):
.. note ::
| Num samples: 9
|
| Seqs/sample summary:
| Min: 146
| Max: 150
| Median: 148.0
| Mean: 148.111111111
| Std. dev.: 1.4487116456
| Median Absolute Deviation: 1.0
| Default even sampling depth in
| core_qiime_analyses.py (just a suggestion): 146
|
| Seqs/sample detail:
| PC.355: 146
| PC.481: 146
| PC.636: 147
| PC.354: 148
| PC.635: 148
| PC.593: 149
| PC.607: 149
| PC.356: 150
| PC.634: 150
Because all samples have at least 146 sequences, a rarefaction level of 146 (specified by `-e 146` above), allows us to compare all 9 samples at equal sequencing depth. Any samples containing fewer than 146 sequences would have been removed from these beta diversity analyses.
Step 2. Make Preferences File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to generate the PCoA plots, we want to generate a preferences file, which defines the colors for each of the samples or for a particular category within a mapping column. For more information on making a preferences file, please refer to `make_prefs_file.py <../scripts/make_prefs_file.html>`_. The prefs file allows, among other things, different PCoA plots to share the same color scheme.
Step 3. Compute Beta Diversity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Beta-diversity metrics assess the differences between microbial communities. By default, QIIME calculates both weighted and unweighted unifrac, which are phylogenetically aware measures of beta diversity.
The resulting distance matrices ( :file:`wf_bdiv_even146/unweighted_unifrac_dm.txt` and :file:`wf_bdiv_even146/weighted_unifrac_dm.txt`) are the basis for later analysis steps (principal coordinate analysis, hierarchical clustering, and distance histograms)
Step 4. Generate Principal Coordinates
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Principal Coordinate Analysis (PCoA) is a technique that helps to extract and visualize a few highly informative components of variation from complex, multidimensional data. This is a transformation that maps the samples present in the distance matrix to a new set of orthogonal axes such that a maximum amount of variation is explained by the first principal coordinate, the second largest amount of variation is explained by the second principal coordinate, etc. The principal coordinates can be plotted in two or three dimensions to provide an intuitive visualization of the data structure and look at differences between the samples, and look for similarities by sample category.
The files :file:`wf_bdiv_even146/unweighted_unifrac_pc.txt` and :file:`wf_bdiv_even146/weighted_unifrac_pc.txt` list every sample in the first column, and the subsequent columns contain the value for the sample against the noted principal coordinate. At the bottom of each Principal Coordinate column, you will find the eigenvalue and percent of variation explained by the coordinate.
Step 5. Generate 3D PCoA Plots
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
QIIME allows for the inspection of PCoA plots in three dimensions. html files are created in :file:`wf_bdiv_even146/unweighted_unifrac_3d...` and :file:`wf_bdiv_even146/weighted_unifrac_3d...` directories. For the "Treatment" column, all samples with the same "Treatment" will get the same color. For our tutorial, the five control samples are all blue and the four control samples are all green. This lets you easily visualize "clustering" by metadata category. The 3d visualization software allows you to rotate the axes to see the data from different perspectives. By default, the script will plot the first three dimensions in your file. Other combinations can be viewed using the "Views:Choose viewing axes" option in the KiNG viewer (may require the installation of kinemage software). The first 10 components can be viewed using "Views:Parallel coordinates" option or typing "/".
.. image:: ../images/ pcoa2.png
:align: center
Step 6. Generate 2D PCoA Plots
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The two dimensional plot will be rendered as a html file which can be opened with a standard web browser. The html file created in directories :file:`wf_bdiv_even146/unweighted_unifrac_2d...` shows a plot for each combination of the first three principal coordinates. You can view the name for each sample by holding your mouse cursor over the data point.
.. image:: ../images/ pcoa1.png
:align: center
:width: 900px
.. _gendisthist:
Step 7. Generate Distance Histograms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Distance Histograms are a way to compare samples from different categories and see which categories tend to have larger/smaller beta diversity than others.
For each of these groups of distances a histogram is made. The output is an HTML file which is defined by the beta-diversity metric used (e.g., :file:`wf_bdiv_even146/unweighted_unifrac_histograms/unweighted_unifrac_dm_distance_histograms.html`). Within the HTML you can look at all the distance histograms individually, and compare them between each other. Within the webpage, the user can mouseover and/or select the checkboxes in the right panel to turn on/off the different distances within/between categories. In this example, we are comparing the distances between the samples in the Control versus themselves, and in another color, pairwise distances between communities of fasting mice and control mice.
.. image:: ../images/ hist.png
:align: center
.. _jackbd:
Jackknifed Beta Diversity and Hierarchical Clustering
------------------------------------------------------
This workflow uses jackknife replicates to estimate the uncertainty in PCoA plots and hierarchical clustering of microbial communities. Many of the same concepts relevant to beta diversity and PCoA are used here. For this analysis we use the script `jackknifed_beta_diversity.py`, which performs the following steps:
1) Compute the beta diversity distance matrix from the full OTU table (and tree, if applicable) (for more information, refer to `beta_diversity.py <../scripts/beta_diversity.html>`_)
2) Build UPGMA tree from full distance matrix; (for more information, refer to `upgma_cluster.py <../scripts/upgma_cluster.html>`_)
3) Build rarefied OTU tables (for more information, refer to `multiple_rarefactions.py <../scripts/multiple_rarefactions.html>`_)
4) Compute distance matrices for rarefied OTU tables (for more information, refer to `beta_diversity.py <../scripts/beta_diversity.html>`_) <../scripts/beta_diversity.html>`_)
5) Build UPGMA trees from rarefied distance matrices (for more information, refer to `upgma_cluster.py <../scripts/upgma_cluster.html>`_)
6) Compare rarefied UPGMA trees and determine jackknife support for tree nodes. (for more information, refer to `tree_compare.py <../scripts/tree_compare.html>`_ and `consensus_tree.py <../scripts/consensus_tree.html>`_)
7) Compute principal coordinates on each rarefied distance matrix (for more information, refer to `principal_coordinates.py <../scripts/principal_coordinates.html>`_)
8) Compare rarefied principal coordinates plots from each rarefied distance matrix (for more information, refer to `make_3d_plots.py <../scripts/make_3d_plots.html>`_ and `make_2d_plots.py <../scripts/make_2d_plots.html>`_)
To run the analysis, type the following:
::
jackknifed_beta_diversity.py -i otus/otu_table.txt -t otus/rep_set.tre -m Fasting_Map.txt -o wf_jack -e 110
.. _hiarchclust:
Steps 1 and 2. UPGMA Clustering
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Unweighted Pair Group Method with Arithmetic mean (UPGMA) is type of hierarchical clustering method using average linkage and can be used to interpret the distance matrix produced by `beta_diversity.py <../scripts/beta_diversity.html>`_.
The output is a file that can be opened with tree viewing software, such as FigTree.
.. image:: ../images/ UPGMAbytreatment.png
:align: center
:width: 700px
This tree shows the relationship among the 9 samples, and reveals that the 4 samples from the guts of fasting mice cluster together (PC.6xx, fasting data is in :file:`Fasting_Map.txt`).
.. _jacksupport:
Steps 3, 4 and 5. Perform Jackknifing Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To measure the robustness of this result to sequencing effort, we perform a jackknifing analysis, wherein a smaller number of sequences are chosen at random from each sample, and the resulting UPGMA tree from this subset of data is compared with the tree representing the entire available data set. This process is repeated with many random subsets of data, and the tree nodes which prove more consistent across jackknifed datasets are deemed more robust.
First the jackknifed OTU tables must be generated, by subsampling the full available data set. In this tutorial, each sample contains between 146 and 150 sequences, as shown with `per_library_stats.py`__:
__ perlibrarystats_
.. note::
| Num samples: 9
|
| Seqs/sample summary:
| Min: 146
| Max: 150
| ...
To ensure that a random subset of sequences is selected from each sample, we chose to select 110 sequences from each sample (75% of the smallest sample, though this value is only a guideline), which is designated by the "-e" option when running the workflow script (see above).
More jackknife replicates provide a better estimate of the variability expected in beta diversity results, but at the cost of longer computational time. By default, QIIME generates 10 jackknife replicates of the available data. Each replicate is a simulation of a smaller sequencing effort (110 sequences in each sample, as defined below).
The workflow then calculates the distance matrix for each jackknifed dataset, but now in batch mode, which results in two sets of 10 distance matrix files written to the :file:`wf_jack/unweighted_unifrac/rare_dm/` and :file:`wf_jack/weighted_unifrac/rare_dm/` directories. Each of those is then used as the basis for hierarchical clustering with UPGMA, written to the :file:`wf_jack/unweighted_unifrac/rare_upgma/` and :file:`wf_jack/weighted_unifrac/rare_upgma/` directories.
.. _compjackclustertree:
Step 6. Compare Jackknifed Trees
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UPGMA clustering of the 10 distance matrix files results in 10 hierarchical clusters of the 9 mouse microbial communities, each hierarchical cluster based on a random sub-sample of the available sequence data.
This compares the UPGMA clustering based on all available data with the jackknifed UPGMA results. Three files are written to :file:`wf_jack/unweighted_unifrac/upgma_cmp/` and :file:`wf_jack/weighted_unifrac/upgma_cmp/`:
* :file:`master_tree.tre`, which is virtually identical to :file:`jackknife_named_nodes.tre` but each internal node of the UPGMA clustering is assigned a unique name
* :file:`jackknife_named_nodes.tre`
* :file:`jackknife_support.txt` explains how frequently a given internal node had the same set of descendant samples in the jackknifed UPGMA clusters as it does in the UPGMA cluster using the full available data. A value of 0.5 indicates that half of the jackknifed data sets support that node, while 1.0 indicates perfect support.
.. _comppcoa:
Steps 7 and 8. Compare Principal Coordinates plots
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The jackknifed replicate PCoA plots can be compared to assess the degree of variation from one replicate to the next. QIIME displays this variation by displaying confidence ellipsoids around the samples represented in a PCoA plot. The resulting plots are present in :file:`wf_jack/unweighted_unifrac/3d_plots`, as well as the corresponding :file:`weighted_unifrac/` and :file:`2d_plots/` locations. An example is shown below:
.. image:: ../images/ jackpcoa.png
:align: center
:width: 700px
.. _genboottree:
Generate Bootstrapped Tree
^^^^^^^^^^^^^^^^^^^^^^^^^^
:file:`jackknife_named_nodes.tre` can be viewed with FigTree or another tree-viewing program. However, as an example, we can visualize the bootstrapped tree using QIIME's `make_bootstrapped_tree.py <../scripts/make_bootstrapped_tree.html>`_, as follows::
make_bootstrapped_tree.py -m wf_jack/unweighted_unifrac/upgma_cmp/master_tree.tre -s wf_jack/unweighted_unifrac/upgma_cmp/jackknife_support.txt -o wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf
The resulting pdf shows the tree with internal nodes colored, red for 75-100% support, yellow for 50-75%, green for 25-50%, and blue for < 25% support. Although UPGMA shows that PC.354 and PC.593 cluster together and PC.481 with PC.6xx cluster together, we can not have high confidence in that result. However, there is excellent jackknife support for all fasted samples (PC.6xx) which are clustering together, separate from the non-fasted (PC.35x) samples.
.. image:: ../images/ boottree.png
:align: center
Generate 3D Bi-Plots
^^^^^^^^^^^^^^^^^^^^
One can add taxa from the taxon summary files in the folder :file:`wf_taxa_summary/` to a 3D principal coordinates plot using QIIME's `make_3d_plots.py <../scripts/make_3d_plots.html>`_. The coordinates of a given taxon are plotted as a weighted average of the coordinates of all samples, where the weights are the relative abundances of the taxon in the samples. The size of the sphere representing a taxon is proportional to the mean relative abundance of the taxon across all samples. The following example creates a biplot displaying the 5 most abundant phylum-level taxa::
make_3d_plots.py -i wf_bdiv_even146/unweighted_unifrac_pc.txt -m Fasting_Map.txt -t wf_taxa_summary/otu_table_L3.txt --n_taxa_keep 5 -o 3d_biplot
The resulting html file :file:`3d_biplot/unweighted_unifrac_pc_3D_PCoA_plots.html` shows a biplot like this:
.. image:: ../images/ biplot.png
:align: center
Running Workflow Scripts in Parallel
-----------------------------------------------
To run the workflow scripts in parallel, pass the "-a" option to each of the scripts, and optionally the "-O" option to specify the number of parallel jobs to start. If running on a quad-core computer, you can set the number of jobs to start as 4 for one of the workflow scripts as follows:
::
pick_otus_through_otu_table.py -i split_library_output/seqs.fna -o otus -a -O 4
Running the QIIME Tutorial Shell Scripts
-----------------------------------------------
The commands in this tutorial are present as a shell script along with the other tutorial files, which can be run via the terminal. To run the shell scripts, you may need to allow all users to execute them, using the following commands::
chmod a+x ./qiime_tutorial_commands_serial.sh
chmod a+x ./qiime_tutorial_commands_parallel.sh
To run the QIIME tutorial in serial::
./qiime_tutorial_commands_serial.sh
To run the QIIME tutorial in parallel::
./qiime_tutorial_commands_parallel.sh
References
------------
Crawford, P. A., Crowley, J. R., Sambandam, N., Muegge, B. D., Costello, E. K., Hamady, M., et al. (2009). Regulation of myocardial ketone body metabolism by the gut microbiota during nutrient deprivation. Proc Natl Acad Sci U S A, 106(27), 11276-11281.
.. _Cytoscape: http://www.cytoscape.org/
.. _PyNAST: http://pynast.sourceforge.net/
.. _Unifrac: http://bmf2.colorado.edu/unifrac/index.psp
|