File: tutorial.rst

package info (click to toggle)
qiime 1.4.0-2
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 29,704 kB
  • sloc: python: 77,837; haskell: 379; sh: 113; makefile: 103
file content (661 lines) | stat: -rw-r--r-- 53,143 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
.. _tutorial:

==========================
QIIME Overview Tutorial
==========================

Introduction
-------------
This tutorial explains how to use the **QIIME** (Quantitative Insights Into Microbial Ecology) Pipeline to process data from high-throughput 16S rRNA sequencing studies. If you have not already installed qiime, please see the section `Installing Qiime <../install/index.html>`_ first. The purpose of this pipeline is to provide a start-to-finish workflow, beginning with multiplexed sequence reads and finishing with taxonomic and phylogenetic profiles and comparisons of the samples in the study. With this information in hand, it is possible to determine biological and environmental factors that alter microbial community ecology in your experiment.

As an example, we will use data from a study of the response of mouse gut microbial communities to fasting (Crawford et al., 2009). To make this tutorial run quickly on a personal computer, we will use a subset of the data generated from 5 animals kept on the control ad libitum fed diet, and 4 animals fasted for 24 hours before sacrifice. At the end of our tutorial, we will be able to compare the community structure of control vs. fasted animals. In particular, we will be able to compare taxonomic profiles for each sample type, differences in diversity metrics within the samples and between the groups, and perform comparative clustering analysis to look for overall differences in the samples.

In this walkthrough, text like the following: ::

    print_qiime_config.py

denotes the command-line invocation of scripts. You can find full usage information for each script by passing the -h option (help) and/or by reading the full description in the `Documentation <../documentation/index.html>`_. Execute all tutorial commands from within the :file:`qiime_tutorial` directory, which can be downloaded from here: `QIIME Tutorial files <http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.4.0.zip>`_.

To process our data, we will perform the following analyses, each of which is described in more detail below:

* Filter the DNA sequence reads for quality and assign multiplexed reads to starting samples by nucleotide barcode .
* Pick Operational Taxonomic Units (OTUs) based on sequence similarity within the reads, and pick a representative sequence from each OTU.
* Assign the OTU to a taxonomic identity using reference databases.
* Align the OTU sequences and create a phylogenetic tree.
* Calculate diversity metrics for each sample and compare the types of communities, using the taxonomic and phylogenetic assignments.
* Generate UPGMA and PCoA plots to visually depict the differences between the samples, and dynamically work with these graphs to generate publication quality figures.


Essential Files
----------------
All the files you will need for this tutorial are here (http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.4.0.zip). Descriptions of these files are below.

Sequences (.fna)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the 454-machine generated FASTA file. Using the Amplicon processing software on the 454 FLX standard, each region of the PTP plate will yield a fasta file of form :file:`1.TCA.454Reads.fna`, where "1" is replaced with the appropriate region number. For the purposes of this tutorial, we will use the fasta file :file:`Fasting_Example.fna`.

Quality Scores (.qual)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the 454-machine generated quality score file, which contains a score for each base in each sequence included in the FASTA file. Like the fasta file mentioned above, the Amplicon processing software will generate one of these files for each region of the PTP plate, named :file:`1.TCA.454Reads.qual`, etc. For the purposes of this tutorial, we will use the quality scores file :file:`Fasting_Example.qual`.

Mapping File (Tab-delimited .txt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The mapping file is generated by the user. This file contains all of the information about the samples necessary to perform the data analysis. At a minimum, the mapping file should contain the name of each sample, the barcode sequence used for each sample, the linker/primer sequence used to amplify the sample, and a Description column. In general, you should also include in the mapping file any metadata that relates to the samples (for instance, health status or sampling site) and any additional information relating to specific samples that may be useful to have at hand when considering outliers (for example, what medications a patient was taking at time of sampling). Of note: the sample names may only contain alphanumeric characters (A-z) and the dot (.). Full format specifications can be found in the Documentation (`File Formats <../documentation/file_formats.html>`_).

For the purposes of this tutorial, we will use the mapping file :file:`Fasting_Map.txt`. The contents of the mapping file are shown here - as you can see, a nucleotide barcode sequence is provided for each of the 9 samples, as well as metadata related to treatment group and date of birth, and general run descriptions about the project. Fasting_Map.txt file contents:

.. note::

   * #SampleID  BarcodeSequence LinkerPrimerSequence    Treatment DOB   Description
   * #Example mapping file for the QIIME analysis package. These 9 samples are from a study of the effects of
   * #exercise and diet on mouse cardiac physiology (Crawford, et al, PNAS, 2009).
   * PC.354 AGCACGAGCCTA    YATGCTGCCTCCCGTAGGAGT   Control 20061218    Control_mouse__I.D._354
   * PC.355 AACTCGTCGATG    YATGCTGCCTCCCGTAGGAGT   Control 20061218    Control_mouse__I.D._355
   * PC.356 ACAGACCACTCA    YATGCTGCCTCCCGTAGGAGT   Control 20061126    Control_mouse__I.D._356
   * PC.481 ACCAGCGACTAG    YATGCTGCCTCCCGTAGGAGT   Control 20070314    Control_mouse__I.D._481
   * PC.593 AGCAGCACTTGT    YATGCTGCCTCCCGTAGGAGT   Control 20071210    Control_mouse__I.D._593
   * PC.607 AACTGTGCGTAC    YATGCTGCCTCCCGTAGGAGT   Fast    20071112    Fasting_mouse__I.D._607
   * PC.634 ACAGAGTCGGCT    YATGCTGCCTCCCGTAGGAGT   Fast    20080116    Fasting_mouse__I.D._634
   * PC.635 ACCGCAGAGTCA    YATGCTGCCTCCCGTAGGAGT   Fast    20080116    Fasting_mouse__I.D._635
   * PC.636 ACGGTGAGTGTC    YATGCTGCCTCCCGTAGGAGT   Fast    20080116    Fasting_mouse__I.D._636


.. _checkmapping:

Check Mapping File
--------------------------------------------------------------------
Before beginning with QIIME, you should ensure that your mapping file is formatted correctly with the `check_id_map.py <../scripts/check_id_map.html>`_ script. Type: ::

    check_id_map.py -m Fasting_Map.txt -o mapping_output -v

If verbose (-v) is enabled, this utility will display a message indicating whether or not problems were found in the mapping file. Errors and warnings will the output to a log file, which will be present in the specified (-o) output directory. Errors will cause fatal problems with subsequent scripts and must be corrected before moving forward. Warnings will not cause fatal problems, but it is encouraged that you fix these problems as they are often indicative of typos in your mapping file, invalid characters, or other unintended errors that will impact downstream analysis. A :file:`corrected_mapping.txt` file will also be created in the output directory, which will have a copy of the mapping file with invalid characters replaced by underscores, or a message indicating that no invalid characters were found.

.. _assignsamples:

Assign Samples to Multiplex Reads
--------------------------------------------------------------------
The next task is to assign the multiplexed reads to samples based on their nucleotide barcode. Also, this step performs quality filtering based on the characteristics of each sequence, removing any low quality or ambiguous reads. The script for this step is `split_libraries.py <../scripts/split_libraries.html>`_. A full description of parameters for this script are described in the `Documentation <../documentation/index.html>`_. For this tutorial, we will use default parameters (minimum quality score = 25, minimum/maximum length = 200/1000, no ambiguous bases allowed and no mismatches allowed in the primer sequence). Type: ::

    split_libraries.py -m Fasting_Map.txt -f Fasting_Example.fna -q Fasting_Example.qual -o split_library_output

This invocation will create three files in the new directory :file:`split_library_output/`:

* :file:`split_library_log.txt` : This file contains the summary of splitting, including the number of reads detected for each sample and a brief summary of any reads that were removed due to quality considerations.
* :file:`histograms.txt` : This tab delimited file shows the number of reads at regular size intervals before and after splitting the library.
* :file:`seqs.fna` : This is a fasta formatted file where each sequence is renamed according to the sample it came from. The header line also contains the name of the read in the input fasta file and information on any barcode errors that were corrected.

A few lines from the :file:`seqs.fna` file are shown below:

.. note::

   * >PC.634_1 FLP3FBN01ELBSX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT bc_diffs=0
   * CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCC....
   * >PC.634_2 FLP3FBN01EG8AX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT bc_diffs=0
   * TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTT....
   * >PC.354_3 FLP3FBN01EEWKD orig_bc=AGCACGAGCCTA new_bc=AGCACGAGCCTA bc_diffs=0
   * TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGGCTATGCATCATTGCCTT....
   * >PC.481_4 FLP3FBN01DEHK3 orig_bc=ACCAGCGACTAG new_bc=ACCAGCGACTAG bc_diffs=0
   * CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGACT....

.. _pickotusandrepseqs:

Picking Operational Taxonomic Units (OTUs) through making OTU table
--------------------------------------------------------------------

Here we will be running the `pick_otus_through_otu_table.py <../scripts/pick_otus_through_otu_table.html>`_ workflow, which performs a series of small steps by calling a series of other scripts automatically. This workflow consists of the following steps:

1. Picking OTUs (for more information, refer to `pick_otus.py <../scripts/pick_otus.html>`_)
2. Picking a representative sequence set, one sequence from each OTU (for more information, refer to `pick_rep_set.py <../scripts/pick_rep_set.html>`_)
3. Aligning the representative sequence set (for more information, refer to `align_seqs.py <../scripts/align_seqs.html>`_)
4. Assigning taxonomy to the representative sequence set (for more information, refer to `assign_taxonomy.py <../scripts/assign_taxonomy.html>`_)
5. Filtering the alignment prior to tree building - removing positions which are all gaps, or not useful for phylogenetic inference (for more information, refer to `filter_alignment.py <../scripts/filter_alignment.html>`_)
6. Building a phylogenetic tree  (for more information, refer to `make_phylogeny.py <../scripts/make_phylogeny.html>`_)
7. Building an OTU table (for more information, refer to `make_otu_table.py <../scripts/make_otu_table.html>`_)


Using the output from split_libraries.py (the seqs.fna file), run the following command: ::

    pick_otus_through_otu_table.py -i split_library_output/seqs.fna -o otus

Optionally, we could denoise the sequences based on clustering the flowgram sequences. For a single library/sff file we can simply use the workflow script `pick_otus_through_otu_tables.py <../scripts/pick_otus_through_otu_table.html>`_, by providing the script with the sff file and the metadata mapping file. For multiple sff files refer to the special purpose tutorial `Denoising of 454 Data Sets <denoising_454_data.html>`_.


The results of `pick_otus_through_otu_table.py` are in :file:`otus/`, and a description of the steps performed and the results follow:

.. _pickotusseqsim:

Step 1. Pick OTUs based on Sequence Similarity within the Reads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

At this step, all of the sequences from all of the samples will be clustered into Operational Taxonomic Units (OTUs) based on their sequence similarity. OTUs in QIIME are clusters of sequences, frequently intended to represent some degree of taxonomic relatedness. For example, when sequences are clustered at 97% sequence similarity with uclust, each resulting cluster is typically thought of as representing a species. This model and the current techniques for picking OTUs are known to be flawed, however, in that 97% OTUs do not match what humans have called species for many microbes. Determining exactly how OTUs should be defined, and what they represent, is an active area of research. 

`pick_otus_through_otu_table.py` assigns sequences to OTUs at 97% similarity by default. Further information on how to view and change default behavior will be discussed later.


.. _pickrepseqsforotu:

Step 2. Pick Representative Sequences for each OTU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since each OTU may be made up of many related sequences, we will pick a representative sequence from each OTU for downstream analysis. This representative sequence will be used for taxonomic identification of the OTU and phylogenetic alignment. QIIME uses the OTU file created above and extracts a representative sequence from the fasta file by one of several methods.

In the :file:`otus/rep_set/` directory, QIIME has created two new files - the log file :file:`seqs_rep_set.log` and the fasta file :file:`seqs_rep_set.fasta` containing one representative sequence for each OTU. In this fasta file, the sequence has been renamed by the OTU, and the additional information on the header line reflects the sequence used as the representative:

.. note::

   * >0 PC.636_424
   * CTGGGCCGTATCTCAGTCCCAATGTGGCCGGTCGACCTCTC....
   * >1 PC.481_321
   * TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTC....

.. _assigntax:

Step 3. Assign Taxonomy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A primary goal of the QIIME pipeline is to assign high-throughput sequencing reads to taxonomic identities using established databases. This provides information on the microbial lineages found in microbial samples. By default, QIIME uses the RDP classifier to assign taxonomic data to each representative sequence from step 2, above.

In the directory :file:`otus/rdp_assigned_taxonomy/`, there will be a log file and a text file. The text file contains a line for each OTU considered, with the RDP taxonomy assignment and a numerical confidence of that assignment (1 is the highest possible confidence). For some OTUs, the assignment will be as specific as a bacterial species, while others may be assignable to nothing more specific than the bacterial domain. Below are the first few lines of the text file and the user should note that the taxonomic assignment and confidence numbers from their run may not coincide with the output shown below, due to the RDP classification algorithm:

.. note::

    * 41    PC.356_347  Root;Bacteria                                                                   0.980
    * 63    PC.635_130  Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"           0.960
    * 353   PC.634_150  Root;Bacteria;Proteobacteria;Deltaproteobacteria                                0.880
    * 18    PC.355_1011 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Rikenellaceae;Alistipes 0.990

.. _alignotuseq:

Step 4. Align OTU Sequences
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Alignment of the sequences and phylogeny inference is necessary only if phylogenetic tools such as UniFrac_ will be subsequently invoked. Alignments can either be generated de novo using programs such as MUSCLE, or through assignment to an existing alignment with tools like PyNAST_. For small studies such as this tutorial, either method is possible. However, for studies involving many sequences (roughly, more than 1000), the de novo aligners are very slow and assignment with PyNAST_ is preferred. Since this is one of the most computationally intensive bottlenecks in the pipeline, large studies benefit greatly from parallelization of this task (described in detail in the `Documentation <../documentation/index.html>`_):  When using PyNAST_ as an aligner (the default), QIIME must know the location of  a template alignment. Most QIIME installations use the greengenes file 'core_set_aligned.fasta.imputed' by default.


After aligning the sequences, a log file and an alignment file are created in the directory :file:`otus/pynast_aligned_seqs/`.

.. _filteraln:

Step 5. Filter Alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before inferring a phylogenetic tree relating the sequences, it is beneficial to filter the sequence alignment to removed columns comprised of only gaps, and locations known to be excessively variable. Most QIIME installations use a lanemask file named either lanemask_in_1s_and_0s.txt or lanemask_in_1s_and_0s by default. After filtering, a filtered alignment file is created in the directory :file:`otus/pynast_aligned_seqs/`.

.. _maketree:

Step 6. Make Phylogenetic Tree
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The filtered alignment file produced in the directory :file:`otus/pynast_aligned_seqs/` is then used to build a phylogenetic tree using a tree-building program. 

The Newick format tree file is written to :file:`rep_set.tre`, which is located in the :file:`otus/` directory . This file can be viewed in a tree visualization software, and is necessary for UniFrac_ diversity measurements and other phylogenetically aware analyses (described below). The tree obtained can be visualized with programs such as FigTree, which was used to visualize the phylogenetic tree obtained from :file:`rep_set.tre`.

.. image:: ../images/ tree.png
   :align: center


.. _makeotutable:

Step 7. Make OTU Table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using taxonomic assignments (step 3) and the OTU map (step 1) QIIME assembles a readable matrix of OTU abundance in each sample with meaningful taxonomic identifiers for each OTU.

The result of this step is :file:`otu_table.txt`, which is located in the :file:`otus/` directory. The first few lines of :file:`otu_table.txt` are shown below (OTUs 1-9), where the first column contains the OTU number, the last column contains the taxonomic assignment for the OTU, and 9 columns between are for each of our 9 samples. The value of each *i,j* entry in the matrix is the number of times OTU *i* was found in the sequences for sample *j*.

.. note ::

   | #Full OTU Counts
   | #OTU ID    PC.354  PC.355  PC.356  PC.481  PC.593  PC.607  PC.634  PC.635  PC.636  Consensus Lineage
   | 0  0   0   0   0   0   0   0   1   0   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
   | 1  0   0   0   0   0   1   0   0   0   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
   | 2  0   0   0   0   0   0   0   0   1   Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Porphyromonadaceae;Parabacteroides
   | 3  2   1   0   0   0   0   0   0   0   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae";"Lachnospiraceae Incertae Sedis"
   | 4  1   0   0   0   0   0   0   0   0   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"
   | 5  0   0   0   0   0   0   0   0   1   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales
   | 6  0   0   0   0   0   0   0   1   0   Root;Bacteria;Actinobacteria;Actinobacteria
   | 7  0   0   2   0   0   0   0   0   1   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Ruminococcaceae"
   | 8  1   1   0   2   4   0   0   0   0   Root;Bacteria;Firmicutes;"Bacilli";"Lactobacillales";Lactobacillaceae;Lactobacillus
   | 9  0   0   2   0   0   0   0   0   0   Root;Bacteria;Firmicutes;"Clostridia";Clostridiales;"Lachnospiraceae"


.. _perlibrarystats:

View statistics of the OTU table
--------------------------------------------------------------------
To view the number of sequence reads which were assigned to the otu table (otus/otu_table.txt), type::

    per_library_stats.py -i otus/otu_table.txt

The output shows that there are relatively few sequences in this tutorial example, but the sequences present are fairly evenly distributed among the 9 microbial communities.

.. note ::

    | Num samples: 9
    | 
    | Seqs/sample summary:
    |  Min: 146
    |  Max: 150
    |  Median: 148.0
    |  Mean: 148.111111111
    |  Std. dev.: 1.4487116456
    |  Median Absolute Deviation: 1.0
    |  Default even sampling depth in
    |   core_qiime_analyses.py (just a suggestion): 146
    | 
    | Seqs/sample detail:
    |  PC.355: 146
    |  PC.481: 146
    |  PC.636: 147
    |  PC.354: 148
    |  PC.635: 148
    |  PC.593: 149
    |  PC.607: 149
    |  PC.356: 150
    |  PC.634: 150


.. _makeheatmap:

Make OTU Heatmap
--------------------------------------------------------------------
The QIIME pipeline includes a very useful utility to generate images of the OTU table. The script is `make_otu_heatmap_html.py <../scripts/make_otu_heatmap_html.html>`_. Type::

    make_otu_heatmap_html.py -i otus/otu_table.txt -o otus/OTU_Heatmap/

An html file is created in the directory :file:`otus/OTU_Heatmap/`. You can open this file with any web browser, and will be prompted to enter a value for "Filter by Counts per OTU". Only OTUs with total counts at or above this threshold will be displayed. The OTU heatmap displays raw OTU counts per sample, where the counts are colored based on the contribution of each OTU to the total OTU count present in that sample (blue: contributes low percentage of OTUs to sample; red: contributes high percentage of OTUs). Leave the filter value unchanged, and click the "Sample ID" button, and a graphic will be generated like the figure below. For each sample, you will see in a heatmap the number of times each OTU was found in that sample. You can mouse over any individual count to get more information on the OTU (including taxonomic assignment). Within the mouseover, there is a link for the terminal lineage assignment, so you can easily search Google for more information about that assignment.

.. image:: ../images/ heatmap.png
   :align: center

Alternatively, you can click on one of the counts in the heatmap and a new pop-up window will appear. The pop-up window uses a Google Visualization API called Magic-Table. Depending on which table count you clicked on, the pop-up window will put the clicked-on count in the middle of the pop-up heatmap as shown below. For the following example, the table count with the red arrow mouseover is the same one being focused on using the Magic-Table.

.. image:: ../images/ fisheyeheatmap.png
   :align: center

On the original heatmap webpage, select the "Taxonomy" button instead: you will generate a heatmap keyed by taxon assignment, which allows you to conveniently look for organisms and lineages of interest in your study. Again, mousing over an individual count will show additional information for that OTU and sample.

.. image:: ../images/ taxheatmap.png
   :align: center

.. _makeotunetwork:

Make OTU Network
----------------------------------------------
An alternative to viewing the OTU table as a heatmap is to create an OTU network, using the following command.::

    make_otu_network.py -m Fasting_Map.txt -i otus/otu_table.txt -o otus/OTU_Network

To visualize the network, we use the Cytoscape_ program (which you can run by calling cytoscape from the command line -- you may need to call this beginning either with a capital or lowercase 'C' depending on your version of Cytoscape), where each red circle represents a sample and each white square represents an OTU. The lines represent the OTUs present in a particular sample (blue for controls and green for fasting). For more information about opening the files in Cytoscape_ please refer to the `Cytoscape Usage <../scripts/cytoscape_usage.html>`_.

.. image:: ../images/ network.png
   :align: center

.. _summarizetaxa:

Summarize Communities by Taxonomic Composition
----------------------------------------------------------------------------
You can group OTUs by samples or categories (when "-c" option is passed) by different taxonomic levels (division, class, family, etc.) with the workflow script `summarize_taxa_through_plots.py <../scripts/summarize_taxa_through_plots.html>`_. Note that this process depends directly on the method used to assign taxonomic information to OTUS (see `Assigning Taxonomy`__ above). Type: 

__ assigntax_

::

    summarize_taxa_through_plots.py -i otus/otu_table.txt -o wf_taxa_summary -m Fasting_Map.txt

The script will generate a new table grouping sequences by taxonomic assignment at various levels, for example the phylum level table at: :file:`wf_taxa_summary/otu_table_L3.txt`. The value of each *i,j* entry in the matrix is the count of the number of times all OTUs belonging to the taxon *i* (for example, Phylum Actinobacteria) were found in the sequences for sample *j*.

.. note::

   | #Full OTU Counts
   | Taxon              PC.354 PC.355   PC.356  PC.481  PC.593  PC.607  PC.634  PC.635  PC.636
   | Root;Bacteria;Actinobacteria   0.0 0.0 0.0 1.0 0.0 2.0 3.0 1.0     1.0
   | Root;Bacteria;Bacteroidetes    7.0 38.0    15.0    19.0    30.0    40.0    86.0    54.0    90.0
   | Root;Bacteria;Deferribacteres  0.0 0.0 0.0 0.0 0.0 3.0 5.0 2.0 7.0
   | Root;Bacteria;Firmicutes   136.0   102.0   115.0   117.0   65.0    66.0    37.0    63.0    34.0
   | Root;Bacteria;Other        5.0 6.0 18.0    9.0 49.0    35.0    14.0    27.0    14.0
   | Root;Bacteria;Proteobacteria   0.0 0.0 0.0 0.0 5.0 3.0 2.0 0.0 1.0
   | Root;Bacteria;TM7      0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
   | Root;Bacteria;Verrucomicrobia  0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
   | Root;Other         0.0 0.0 2.0 0.0 0.0 0.0 0.0 1.0 0.0

.. _maketaxacharts:

To view the resulting charts, open the area or bar chart html file located in the  :file:`wf_taxa_summary/taxa_summary_plots` folder. The following chart shows the taxa assignments for each sample as an area chart. You can mouseover the plot to see which taxa are contributing to the percentage shown.

.. image:: ../images/areachart1.png
   :align: center

The following chart shows the taxa assignments for each sample as a bar chart.

.. image:: ../images/barchart1.png
   :align: center

.. _compalphadivrarecurves:

Compute Alpha Diversity within the Samples and Generate Rarefaction Curves
---------------------------------------------------------------------------
Community ecologists typically describe the microbial diversity within their study. This diversity can be assessed within a sample (alpha diversity) or between a collection of samples (beta diversity). Here, we will determine the level of alpha diversity in our samples using a series of scripts from the QIIME pipeline.  To perform this analysis, we will use the :file:`alpha_rarefaction.py` workflow script. This script performs the following steps:

1. Generate rarefied OTU tables (for more information, refer to `multiple_rarefactions.py <../scripts/multiple_rarefactions.html>`_)
2. Compute measures of alpha diversity for each rarefied OTU table (for more information, refer to `alpha_diversity.py <../scripts/alpha_diversity.html>`_)
3. Collate alpha diversity results (for more information, refer to `collate_alpha.py <../scripts/collate_alpha.html>`_)
4. Generate alpha rarefaction plots (for more information, refer to `make_rarefaction_plots.py <../scripts/make_rarefaction_plots.html>`_)

Although we could run this workflow with the (sensible) default parameters, this provides an excellent opportunity to illustrate the use of custom parameters. To see what measures of alpha diversity will be computed by default, type: ::

    alpha_diversity.py -h

You should see, among other information:

.. note ::

  | -m METRICS, --metrics=METRICS
  |      Alpha-diversity metric(s) to use. A comma-separated
  |      list should be provided when multiple metrics are
  |      specified. [default:
  |      PD_whole_tree,chao1,observed_species]

to also use the shannon index, create a custom parameters file by typing: ::

    echo "alpha_diversity:metrics shannon,PD_whole_tree,chao1,observed_species" > alpha_params.txt

Then run the workflow, which requires the OTU table (-i) and phylogenetic tree (-t) from `above`__, and the custom parameters file we just created: 

__ pickotusandrepseqs_

::

    alpha_rarefaction.py -i otus/otu_table.txt -m Fasting_Map.txt -o wf_arare/ -p alpha_params.txt -t otus/rep_set.tre

Descriptions of the steps involved in alpha_rarefaction.py follow:

.. _rareotutable:

Step 1. Rarify OTU Table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The directory :file:`wf_arare/rarefaction/` will contain many text files named :file:`rarefaction_##_#.txt`; the first set of numbers represents the number of sequences sampled, and the last number represents the iteration number. If you opened one of these files, you would find an OTU table where for each sample the sum of the counts equals the number of samples taken.

.. _computealphadiv:

Step 2. Compute Alpha Diversity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The rarefaction tables are the basis for calculating diversity metrics, which reflect the diversity within the sample based on the abundance of various taxa within a community. The QIIME pipeline allows users to conveniently calculate more than two dozen different diversity metrics. The full list of available metrics is available here: `alpha-diversity metrics <../scripts/alpha_diversity_metrics.html>`_. Every metric has different strengths and limitations - technical discussion of each metric is readily available online and in ecology textbooks, but it is beyond the scope of this document. By default, QIIME calculates three metrics:

#. Chao1 metric estimates the species richness.
#. The Observed Species metric is simply the count of unique OTUs found in the sample.
#. Phylogenetic Distance (PD_whole_tree) is the only phylogenetic metric used, and requires a phylogenetic tree.

In addition, :file:`alpha_params.txt` specified above adds the shannon index to the list of alpha diversity measures calculated by QIIME.

The result of this step produces several text files with the results of the alpha diversity computations performed on the rarefied OTU tables. The results are located in the :file:`wf_arare/alpha_div/` directory.

.. _collateotutable:

Step 3. Collate Rarified OTU Tables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The output directory :file:`wf_arare/alpha_div/` will contain one text file :file:`alpha_rarefaction_##_#` for every file input from :file:`wf_arare/rarefaction/`, where the numbers represent the number of samples and iterations as before. The content of this tab delimited file is the calculated metrics for each sample. To collapse the individual files into a single combined table, the workflow uses the script `collate_alpha.py <../scripts/collate_alpha.html>`_.

In the newly created directory :file:`wf_arare/alpha_div_collated/`, there will be one matrix for every alpha diversity metric used. This matrix will contain the metric for every sample, arranged in ascending order from lowest number of sequences per sample to highest. A portion of the :file:`observed_species.txt` file are shown below:

.. note::

   * Sequences per sample   iteration   PC.354  PC.355  PC.356  PC.481  PC.593   
   * alpha_rarefaction_21_0.txt 21          0       14.0    16.0    18.0    18.0    13.0
   * alpha_rarefaction_21_1.txt 21          1       15.0    17.0    18.0    20.0    12.0
   * alpha_rarefaction_21_2.txt 21          2       15.0    16.0    21.0    19.0    13.0
   * alpha_rarefaction_21_3.txt 21          3       10.0    19.0    18.0    21.0    13.0
   * alpha_rarefaction_21_4.txt 21          4       14.0    18.0    16.0    15.0    12.0
   * ...

.. _generaterarecurves:

Step 4. Generate Rarefaction Curves
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
QIIME creates plots of alpha diversity vs. simulated sequencing effort, known as rarefaction plots, using the script `make_rarefaction_plots.py <../scripts/make_rarefaction_plots.html>`_. This script takes a mapping file and any number of rarefaction files generated by `collate_alpha.py <../scripts/collate_alpha.html>`_ and creates rarefaction curves. Each curve represents a sample and can be colored by the sample metadata supplied in the mapping file.

This step generates a :file:`wf_arare/alpha_rarefaction_plots/rarefaction_plots.html` that can be opened with a web browser, in addition to other files. The :file:`wf_arare/alpha_rarefaction_plots/average_tables/` folder, which contains the rarefaction averages for each diversity metric, so the user can optionally plot the rarefaction curves in another application, like MS Excel. The :file:`wf_arare/alpha_rarefaction_plots/average_plots/` folder contains the average plots for each metric and category and the :file:`wf_arare/alpha_rarefaction_plots/html_plots/` folder contains all the images used in the html page generated. 



Viewing Alpha Diversity Results
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To view the rarefaction plots, open the file :file:`wf_arare/alpha_rarefaction_plots/rarefaction_plots.html` in a web browser, typically by double-clicking on it. Once the browser window is open,  select the metric `PD_whole_tree` and the category `Treatment`, to reveal a plot like the figure below. You can also turn on/off lines in the plot by (un)checking the box next to each label in the legend, or click on the triangle next to each label in the legend to see all the samples that contribute to that category. Below each plot is a table displaying average values for each measure of alpha diversity for each group of samples the specified category.

.. image:: ../images/ rarecurve.png
   :align: center


.. _compbetadivgenpcoa:

Compute Beta Diversity and Generate Beta Diversity Plots
--------------------------------------------------------
Beta diversity represents the explicit comparison of microbial (or other) communities based on their composition. Beta-diversity metrics thus assess the differences between microbial communities. The fundamental output of these comparisons is a square matrix where a "distance" or dissimilarity is calculated between every pair of community samples, reflecting the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal Coordinate Analysis (PCoA) and hierarchical clustering. Like alpha diversity, there are many possible metrics which can be calculated with the QIIME pipeline - the full list of options can be found here `beta diversity metrics <../scripts/beta_diversity_metrics.html>`_. Here, we will calculate beta diversity between our 9 microbial communities using the default beta diversity metrics of weighted and unweighted unifrac, which are phylogenetic measures used extensively in recent microbial community sequencing projects. To perform this analysis, we will use the `beta_diversity_through_plots.py <../scripts/beta_diversity_through_plots.html>`_ workflow script. This script performs the following steps:

1. Rarify OTU table (for more information, refer to `single_rarefaction.py <../scripts/single_rarefaction.html>`_)
2. Make preferences file (for more information, refer to `make_prefs_file.py <../scripts/make_prefs_file.html>`_)
3. Compute Beta Diversity (for more information, refer to `beta_diversity.py <../scripts/beta_diversity.html>`_)
4. Generate Principal Coordinates (for more information, refer to `principal_coordinates.py <../scripts/principal_coordinates.html>`_)
5. Generate 3D PCoA plots (for more information, refer to `make_3d_plots.py <../scripts/make_3d_plots.html>`_)
6. Generate 2D PCoA plots (for more information, refer to `make_2d_plots.py <../scripts/make_2d_plots.html>`_)
7. Make Distance Histograms (for more information, refer to `make_distance_histograms.py <../scripts/make_distance_histograms.html>`_)

To run the workflow, type the following command, which defines the input OTU table "-i" and tree file "-t" (from `pick_otus_through_otu_table.py <../scripts/pick_otus_through_otu_table.html>`_), the user-defined mapping file "-m", the output directory "-o", and the number of sequences per sample (sequencing depth) as 146: ::

    beta_diversity_through_plots.py -i otus/otu_table.txt -m Fasting_Map.txt -o wf_bdiv_even146/ -t otus/rep_set.tre -e 146

Descriptions of the steps involved in `beta_diversity_through_plots.py` follow:

.. _compbetadiv:

Step 1. Rarify OTU Table to Remove Sample Heterogeneity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To remove sample heterogeneity, we can perform rarefaction on our OTU table. Rarefaction is an ecological approach that allows users to standardize the data obtained from samples with different sequencing efforts, and to compare the OTU richness of the samples using this standardized platform. For instance, if one of your samples yielded 10,000 sequence counts, and another yielded only 1,000 counts, the species diversity within those samples may be much more influenced by sequencing effort than underlying biology. The approach of rarefaction is to randomly sample the same number of OTUs from each sample, and use this data to compare the communities at a given level of sampling effort.

The 9 communities in the tutorial data contain the following numbers of sequences per sample (see perlibrarystats_):

.. note ::

    | Num samples: 9
    | 
    | Seqs/sample summary:
    |  Min: 146
    |  Max: 150
    |  Median: 148.0
    |  Mean: 148.111111111
    |  Std. dev.: 1.4487116456
    |  Median Absolute Deviation: 1.0
    |  Default even sampling depth in
    |   core_qiime_analyses.py (just a suggestion): 146
    | 
    | Seqs/sample detail:
    |  PC.355: 146
    |  PC.481: 146
    |  PC.636: 147
    |  PC.354: 148
    |  PC.635: 148
    |  PC.593: 149
    |  PC.607: 149
    |  PC.356: 150
    |  PC.634: 150

Because all samples have at least 146 sequences, a rarefaction level of 146 (specified by `-e 146` above), allows us to compare all 9 samples at equal sequencing depth. Any samples containing fewer than 146 sequences would have been removed from these beta diversity analyses.

Step 2. Make Preferences File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to generate the PCoA plots, we want to generate a preferences file, which defines the colors for each of the samples or for a particular category within a mapping column.  For more information on making a preferences file, please refer to `make_prefs_file.py <../scripts/make_prefs_file.html>`_. The prefs file allows, among other things, different PCoA plots to share the same color scheme.

Step 3. Compute Beta Diversity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Beta-diversity metrics assess the differences between microbial communities. By default, QIIME calculates both weighted and unweighted unifrac, which are phylogenetically aware measures of beta diversity.

The resulting distance matrices ( :file:`wf_bdiv_even146/unweighted_unifrac_dm.txt` and :file:`wf_bdiv_even146/weighted_unifrac_dm.txt`) are the basis for later analysis steps (principal coordinate analysis, hierarchical clustering, and distance histograms)

Step 4. Generate Principal Coordinates
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Principal Coordinate Analysis (PCoA) is a technique that helps to extract and visualize a few highly informative components of variation from complex, multidimensional data. This is a transformation that maps the samples present in the distance matrix to a new set of orthogonal axes such that a maximum amount of variation is explained by the first principal coordinate, the second largest amount of variation is explained by the second principal coordinate, etc. The principal coordinates can be plotted in two or three dimensions to provide an intuitive visualization of the data structure and look at differences between the samples, and look for similarities by sample category. 

The files :file:`wf_bdiv_even146/unweighted_unifrac_pc.txt` and :file:`wf_bdiv_even146/weighted_unifrac_pc.txt` list every sample in the first column, and the subsequent columns contain the value for the sample against the noted principal coordinate. At the bottom of each Principal Coordinate column, you will find the eigenvalue and percent of variation explained by the coordinate.


Step 5. Generate 3D PCoA Plots
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
QIIME allows for the inspection of PCoA plots in three dimensions. html files are created in :file:`wf_bdiv_even146/unweighted_unifrac_3d...` and :file:`wf_bdiv_even146/weighted_unifrac_3d...` directories. For the "Treatment" column, all samples with the same "Treatment" will get the same color. For our tutorial, the five control samples are all blue and the four control samples are all green. This lets you easily visualize "clustering" by metadata category. The 3d visualization software allows you to rotate the axes to see the data from different perspectives. By default, the script will plot the first three dimensions in your file. Other combinations can be viewed using the "Views:Choose viewing axes" option in the KiNG viewer (may require the installation of kinemage software). The first 10 components can be viewed using "Views:Parallel coordinates" option or typing "/".

.. image:: ../images/ pcoa2.png
   :align: center


Step 6. Generate 2D PCoA Plots
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The two dimensional plot will be rendered as a html file which can be opened with a standard web browser. The html file created in directories :file:`wf_bdiv_even146/unweighted_unifrac_2d...` shows a plot for each combination of the first three principal coordinates. You can view the name for each sample by holding your mouse cursor over the data point.

.. image:: ../images/ pcoa1.png
   :align: center
   :width: 900px


.. _gendisthist:

Step 7. Generate Distance Histograms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Distance Histograms are a way to compare samples from different categories and see which categories tend to have larger/smaller beta diversity than others.

For each of these groups of distances a histogram is made. The output is an HTML file which is defined by the beta-diversity metric used (e.g.,  :file:`wf_bdiv_even146/unweighted_unifrac_histograms/unweighted_unifrac_dm_distance_histograms.html`). Within the HTML you can look at all the distance histograms individually, and compare them between each other. Within the webpage, the user can mouseover and/or select the checkboxes in the right panel to turn on/off the different distances within/between categories. In this example, we are comparing the distances between the samples in the Control versus themselves, and in another color, pairwise distances between communities of fasting mice and control mice.

.. image:: ../images/ hist.png
   :align: center

.. _jackbd:

Jackknifed Beta Diversity and Hierarchical Clustering
------------------------------------------------------
This workflow uses jackknife replicates to estimate the uncertainty in PCoA plots and hierarchical clustering of microbial communities. Many of the same concepts relevant to beta diversity and PCoA are used here. For this analysis we use the script `jackknifed_beta_diversity.py`, which performs the following steps:

  1) Compute the beta diversity distance matrix from the full OTU table (and tree, if applicable) (for more information, refer to `beta_diversity.py <../scripts/beta_diversity.html>`_)
  2) Build UPGMA tree from full distance matrix; (for more information, refer to `upgma_cluster.py <../scripts/upgma_cluster.html>`_)
  3) Build rarefied OTU tables (for more information, refer to `multiple_rarefactions.py <../scripts/multiple_rarefactions.html>`_)
  4) Compute distance matrices for rarefied OTU tables (for more information, refer to `beta_diversity.py <../scripts/beta_diversity.html>`_) <../scripts/beta_diversity.html>`_)
  5) Build UPGMA trees from rarefied distance matrices (for more information, refer to `upgma_cluster.py <../scripts/upgma_cluster.html>`_)
  6) Compare rarefied UPGMA trees and determine jackknife support for tree nodes. (for more information, refer to `tree_compare.py <../scripts/tree_compare.html>`_ and `consensus_tree.py <../scripts/consensus_tree.html>`_)
  7) Compute principal coordinates on each rarefied distance matrix (for more information, refer to `principal_coordinates.py <../scripts/principal_coordinates.html>`_)
  8) Compare rarefied principal coordinates plots from each rarefied distance matrix (for more information, refer to `make_3d_plots.py <../scripts/make_3d_plots.html>`_ and `make_2d_plots.py <../scripts/make_2d_plots.html>`_)


To run the analysis, type the following:

::

    jackknifed_beta_diversity.py -i otus/otu_table.txt -t otus/rep_set.tre -m Fasting_Map.txt -o wf_jack -e 110

.. _hiarchclust:

Steps 1 and 2. UPGMA Clustering
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Unweighted Pair Group Method with Arithmetic mean (UPGMA) is type of hierarchical clustering method using average linkage and can be used to interpret the distance matrix produced by `beta_diversity.py <../scripts/beta_diversity.html>`_. 

The output is a file that can be opened with tree viewing software, such as FigTree.

.. image:: ../images/ UPGMAbytreatment.png
   :align: center
   :width: 700px

This tree shows the relationship among the 9 samples, and reveals that the 4 samples from the guts of fasting mice cluster together (PC.6xx, fasting data is in :file:`Fasting_Map.txt`). 

.. _jacksupport:

Steps 3, 4 and 5. Perform Jackknifing Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To measure the robustness of this result to sequencing effort, we perform a jackknifing analysis, wherein a smaller number of sequences are chosen at random from each sample, and the resulting UPGMA tree from this subset of data is compared with the tree representing the entire available data set. This process is repeated with many random subsets of data, and the tree nodes which prove more consistent across jackknifed datasets are deemed more robust.

First the jackknifed OTU tables must be generated, by subsampling the full available data set. In this tutorial, each sample contains between 146 and 150 sequences, as shown with `per_library_stats.py`__:

__ perlibrarystats_

.. note::

    | Num samples: 9
    | 
    | Seqs/sample summary:
    |  Min: 146
    |  Max: 150
    |  ...

To ensure that a random subset of sequences is selected from each sample, we chose to select 110 sequences from each sample (75% of the smallest sample, though this value is only a guideline), which is designated by the "-e" option when running the workflow script (see above).

More jackknife replicates provide a better estimate of the variability expected in beta diversity results, but at the cost of longer computational time. By default, QIIME generates 10 jackknife replicates of the available data. Each replicate is a simulation of a smaller sequencing effort (110 sequences in each sample, as defined below).

The workflow then calculates the distance matrix for each jackknifed dataset, but now in batch mode, which results in two sets of 10 distance matrix files written to the :file:`wf_jack/unweighted_unifrac/rare_dm/` and :file:`wf_jack/weighted_unifrac/rare_dm/` directories. Each of those is then used as the basis for hierarchical clustering with UPGMA, written to the :file:`wf_jack/unweighted_unifrac/rare_upgma/` and :file:`wf_jack/weighted_unifrac/rare_upgma/` directories.

.. _compjackclustertree:

Step 6. Compare Jackknifed Trees
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UPGMA clustering of the 10 distance matrix files results in 10 hierarchical clusters of the 9 mouse microbial communities, each  hierarchical cluster based on a random sub-sample of the available sequence data. 

This compares the UPGMA clustering based on all available data with the jackknifed UPGMA results. Three files are written to :file:`wf_jack/unweighted_unifrac/upgma_cmp/` and :file:`wf_jack/weighted_unifrac/upgma_cmp/`:

    * :file:`master_tree.tre`, which is virtually identical to :file:`jackknife_named_nodes.tre` but each internal node of the UPGMA clustering is assigned a unique name
    * :file:`jackknife_named_nodes.tre`
    * :file:`jackknife_support.txt` explains how frequently a given internal node had the same set of descendant samples in the jackknifed UPGMA clusters as it does in the UPGMA cluster using the full available data.  A value of 0.5 indicates that half of the jackknifed data sets support that node, while 1.0 indicates perfect support.

.. _comppcoa:

Steps 7 and 8. Compare Principal Coordinates plots
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The jackknifed replicate PCoA plots can be compared to assess the degree of variation from one replicate to the next. QIIME displays this variation by displaying confidence ellipsoids around the samples represented in a PCoA plot. The resulting plots are present in :file:`wf_jack/unweighted_unifrac/3d_plots`, as well as the corresponding :file:`weighted_unifrac/` and :file:`2d_plots/` locations. An example is shown below:

.. image:: ../images/ jackpcoa.png
   :align: center
   :width: 700px
   
.. _genboottree:

Generate Bootstrapped Tree
^^^^^^^^^^^^^^^^^^^^^^^^^^
:file:`jackknife_named_nodes.tre` can be viewed with FigTree or another tree-viewing program. However, as an example, we can visualize the bootstrapped tree using QIIME's `make_bootstrapped_tree.py <../scripts/make_bootstrapped_tree.html>`_, as follows::

    make_bootstrapped_tree.py -m wf_jack/unweighted_unifrac/upgma_cmp/master_tree.tre -s wf_jack/unweighted_unifrac/upgma_cmp/jackknife_support.txt -o wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf

The resulting pdf shows the tree with internal nodes colored, red for 75-100% support, yellow for 50-75%, green for 25-50%, and blue for < 25% support. Although UPGMA shows that PC.354 and PC.593 cluster together and PC.481 with PC.6xx cluster together, we can not have high confidence in that result. However, there is excellent jackknife support for all fasted samples (PC.6xx) which are clustering together, separate from the non-fasted (PC.35x) samples.

.. image:: ../images/ boottree.png
   :align: center

Generate 3D Bi-Plots
^^^^^^^^^^^^^^^^^^^^
One can add taxa from the taxon summary files in the folder :file:`wf_taxa_summary/` to a 3D principal coordinates plot using QIIME's `make_3d_plots.py <../scripts/make_3d_plots.html>`_. The coordinates of a given taxon are plotted as a weighted average of the coordinates of all samples, where the weights are the relative abundances of the taxon in the samples. The size of the sphere representing a taxon is proportional to the mean relative abundance of the taxon across all samples. The following example creates a biplot displaying the 5 most abundant phylum-level taxa::

    make_3d_plots.py -i wf_bdiv_even146/unweighted_unifrac_pc.txt -m Fasting_Map.txt -t wf_taxa_summary/otu_table_L3.txt --n_taxa_keep 5 -o 3d_biplot

The resulting html file :file:`3d_biplot/unweighted_unifrac_pc_3D_PCoA_plots.html` shows a biplot like this:

.. image:: ../images/ biplot.png
   :align: center

Running Workflow Scripts in Parallel
-----------------------------------------------
To run the workflow scripts in parallel, pass the "-a" option to each of the scripts, and optionally the "-O" option to specify the number of parallel jobs to start. If running on a quad-core computer, you can set the number of jobs to start as 4 for one of the workflow scripts as follows:

::

    pick_otus_through_otu_table.py -i split_library_output/seqs.fna -o otus -a -O 4


Running the QIIME Tutorial Shell Scripts
-----------------------------------------------
The commands in this tutorial are present as a shell script along with the other tutorial files, which can be run via the terminal. To run the shell scripts, you may need to allow all users to execute them, using the following commands::

    chmod a+x ./qiime_tutorial_commands_serial.sh
    chmod a+x ./qiime_tutorial_commands_parallel.sh

To run the QIIME tutorial in serial::

    ./qiime_tutorial_commands_serial.sh

To run the QIIME tutorial in parallel::

    ./qiime_tutorial_commands_parallel.sh

References
------------
Crawford, P. A., Crowley, J. R., Sambandam, N., Muegge, B. D., Costello, E. K., Hamady, M., et al. (2009). Regulation of myocardial ketone body metabolism by the gut microbiota during nutrient deprivation. Proc Natl Acad Sci U S A, 106(27), 11276-11281.

.. _Cytoscape: http://www.cytoscape.org/
.. _PyNAST: http://pynast.sourceforge.net/
.. _Unifrac: http://bmf2.colorado.edu/unifrac/index.psp