package info (click to toggle)
qiime 1.9.1+dfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 114,512 kB
  • sloc: python: 109,265; haskell: 379; makefile: 127; sh: 107
file content (539 lines) | stat: -rw-r--r-- 90,052 bytes parent folder | download
QIIME 1.9.1

Bug fixes

* **Critical**: Updated minimum required version of the [qiime-default-reference]( package to 0.1.2. **This release includes an important bug fix described in more detail in [this QIIME blog post]( and in [biocore/qiime-default-reference#14](**
* **Critical**: Fixed bug in ```` fitZIG algorithm ([#1960]( **This was a serious bug that was encountered when users would call `` -a metagenomeSeq_fitZIG``. Any results previosuly generated with that command should be re-run.**
* **Critical**: Fixed bug in ````, described in [#2009]( **All previous output generated with ```` was incorrect, and analyses using those results should be re-run.** This most commonly would have resulted in massive Type 2 error (false negatives), where observations whose abundance is correlated with metadata are not reported, though Type 1 error (false positives) are also possible.
* ```` no longer fails on empty files. [#1991](
* Updated minimum required version of [biom-format]( package to 2.1.4. This is a bug fix release. Details are available in the [biom-format ChangeLog](
* Updated minimum required version of [Emperor]( package to 0.9.51.
* Forced BIOM table type to "OTU table" for all tables written with QIIME. This fixes [#1928](
* The ``--similarity`` option in ```` now only accepts sequence similarity thresholds between 0.0 and 1.0 (inclusive). Previous behavior would allow values outside this range, which would cause uninformative error messages to be raised by the external tools that ```` wraps ([#1979](
* ```` now explicitly disallows ``-p 0``. This could lead to empty sequences being written to the resulting output file ([#1984](
* Fixed issued where ```` could only filter the mapping file when ``--valid_states`` was passed as the filtering method ([#2003](
* Fixed bug where distance matrix files generated by QIIME (e.g., using ````) could have diagonals with values that were close to zero in rare cases (depending on input data, machine architecture, installed dependencies, etc.). These files could not be loaded by QIIME scripts that accepted distance matrix files as input (e.g., ````) and would result in an error message stating that the distance matrix was not hollow. Values on the diagonal that are close to zero are now set to 0.0 ([#1933](

Usability enhancements

* Removed parallel PyNAST ``formatdb`` step ([#1989]( The formatted database wasn't actually being used, this step was just left over from when BLAST was required by PyNAST.
* ```` can now count records in fastq files that have the ``.fq`` extenstion. This previously was only possible for fastq files that have the ``.fastq`` extension.
* If ``temp_dir`` is not defined in the QIIME config file, QIIME will use the system's default temporary directory instead of assuming that ``/tmp`` is present and writeable. Note that the location of this default temporary directory [can be changed with environment variables]( ([#1995](
* Improve error reporting from ````, ````, and ```` when all OTUs/samples are filtered out resulting in an empty table ([#1963](, and generally when attempting to write an empty BIOM table from QIIME.
* Added ability to pass user-defined runtime limit for jobs to ````. This can be achieved by setting the ``slurm_time`` variable in ``qiime_config``, or by passing ``--time`` to ````.
* Distances matrices and UPGMA trees generated from the full (unrarefied) OTU table are now stored under ``unrarefied_bdiv`` in the output directory from ````. That UPGMA tree is optionally used (if the user passes ``--master_tree full``). This change makes their content more explicit so they're less likely to be used by accident ([#2024](

QIIME 1.9.0

New scripts

* ````: Allows the calculation of correlations between feature abundances and continuous-valued metadata. This script replaces the continuous-valued correlation functionality that was in ```` in QIIME 1.7.0 and earlier.
* ````: Allows analysis of volatility using different algorithms.
* ````: Implements the microbial dysbiosis index (MD-index) from [Gevers et al 2014](
* ````: Allows collapsing groups of samples in BIOM tables and mapping files based on their metadata (see [#1678]( This can be used, for example, to collapse samples belonging to a replicate group. This also has replaced ```` (see discussion on [#1798](
* ````, ````, and ````: Facilitate initial QIIME processing of already-demultiplexed fastq files, as these are commonly being provided by sequencing centers.
* ````: Supplements ```` to support metagenomeSeq's fitZIG algorithm and DESeq2's negative binomial algorithm.  The input for this is an unnormalized, raw BIOM table.
* ````: Adds support for BIOM table normalization algorithms in addition to rarefaction. Supported methods are metagenomeSeq's CSS and DESeq's variance stabilizing transformation.
* ````: Allows for parallel job submission using [slurm](
* ````: Allows for demultiplexing of sequences using the LEA-Seq protocol, described in [Faith et al. (2013)]( This script should be considered to be in **beta testing status**.
* ````: Splits an interleaved FASTQ file (like the ones produced by JGI) into forward and reverse reads. See [this section]( of the Illumina data preparation tutorial for more details.
* ````: Perform parallel OTU picking with SortMeRNA ([Kopylova et al. (2012)](


* ```` now allows multiple fields to be passed to split a biom table, and optionally a mapping file. Check out the new documentation for the naming conventions (which have changed slightly) and an example.
* Added new options to ````:
 * ``--color_scheme``, which allows users to choose from different color schemes [here](
 * ``--observation_metadata_category``, which allows users to select a column other than taxonomy to use when labeling the rows
 * ``--observation_metadata_level``, which allows the user to specify which level in the hierarchical metadata category to use in creating the row labels.
 * ``-g``/``--imagetype``, ``--dpi``, ``--width``, and ``--height``, which offer more control over the generation of heatmap figures.
* ``-m/--mapping_fps`` is no longer required for The mapping file is not required when running with ``--barcode_type 'not-barcoded'``,but the mapping file would fail to validate when passing multiple sequence files and sample ids but a mapping file without barcodes (see [#1400](
* Added alphabetical sorting option (based on boxplot labels) to ````. Sorting by boxplot median can now be performed by passing ``--sort median`` (this was previously invoked by passing ``--sort``). Sorting alphabetically can be performed by passing ``--sort alphabetical``.
* Scripts that write an OTU table will now write BIOM files in HDF5 format if HDF5 is installed. This improves performance for very large OTU tables.
* ```` can now take an argument to convert the header names to upper case, so it will merge for example a category named `treatment` and another one named `TREATMENT` from two different mapping files.
* The script ```` has been removed. This functionality should be accessed through ````.
* Beta support has been added for performing OTU picking with open source software:
 * subsampled open reference OTU picking using SortMeRNA ([Kopylova et al. (2012)]( (for the closed-reference steps) and [SumaClust]( (for the open reference steps). This can be accessed with `` -m sortmerna_sumaclust``.
 * closed-reference OTU picking using SortMeRNA ([Kopylova et al. (2012)]( This can be accessed with `` -p params.txt`` where params.txt includes the line ``pick_otus:otu_picking_method sortmerna``.
 * de novo OTU picking using [SumaClust]( or swarm ([Mahe et al. (2014)]( This can be accessed with `` -p params.txt`` where params.txt includes the line ``pick_otus:otu_picking_method sumaclust`` or ``pick_otus:otu_picking_method swarm``.
 * sumaclust v1.0.00, swarm 1.2.19, and sortmerna 2.0 are now optional dependencies (see the [QIIME install docs]( for details).
* Renamed ```` to ````, which now supports splitting FASTQ files, as well. Added a parameter, ``--file_type``, which is used to specify the type of the input file.
* Added ``--assign_taxonomy`` option to ```` to allow taxonomy assignment using a classifier, rather than the default of using the taxonomic assignment of the cluster centroid.
* Added ``--suppress_taxonomy_assignment`` option to ````.
* Updated output of ```` to include more information in the pseudo-mapping file that it generates. This includes the "pre" and "post" values for all of the analysis categories on a per-subject basis. This is useful for plotting with other tools, or for generating legends for the plots that are currently generated by the script (see [issue #1707](
* Added ``pick_otus_reference_seqs_fp`` to the QIIME config file. This is a filepath to reference sequences to use with QIIME's OTU picking scripts/workflows. See the [QIIME config docs]( and [#1696]( for more details.
* The QIIME config settings ``assign_taxonomy_id_to_taxonomy_fp``, ``assign_taxonomy_reference_seqs_fp``, ``pick_otus_reference_seqs_fp``, and ``pynast_template_alignment_fp`` now default to reference data files in the [qiime-default-reference project](
* Installing QIIME via ``pip install qiime`` now works out-of-the-box by providing a functioning QIIME minimal (base) install (see [#1696](
* ``cluster_jobs_fp`` in the QIIME config file now defaults to ````. ``seconds_to_sleep`` now defaults to 1.
* Added ``--negate_sample_id_fp`` option to ```` (see [#1117](
* Added ``--percent_variation_below_one`` flag to ```` for when the percent variation is actually below 1 and not a relative measure.
* The default confidence threshold for the Naive Bayes taxonomy assigners (RDP Classifier and mothur) is now ``0.50``, as [recommended by the RDP Classifier developers]( for partial sequences.

Usability enhancements

* Simplified and improved QIIME install documentation.
* Errors raised by scripts are easier to read and include a supplementary message on how to get help (see [#1794](
* QIIME is now easier to install! Removed ``qiime_scripts_dir``, ``python_exe_fp``, ``working_dir``, ``cloud_environment``, and ``template_alignment_lanemask_fp`` from the QIIME config file. If these values are present in your QIIME config file, they will be flagged as unrecognized by `` -t`` and will be ignored by QIIME. QIIME will now use the ``python`` executable and QIIME scripts that are found in your ``PATH`` environment variable, and ``temp_dir`` will be used in place of ``working_dir`` (this value was used by some parts of parallel QIIME previously). ```` will now use the 16S alignment Lane mask (Lane, D.J. 1991) by default if one is not provided via ``--lane_mask_fp``.
* ``--tail_type`` option in ```` now accepts "two-sided" instead of "two sided" for specifying a two-sided alternative hypothesis. The new name is easier to specify via the command-line (quotes aren't needed because it is a single word).
* `` -t`` now tests a QIIME minimal (base) install instead of a QIIME full install. `` -tf`` tests a QIIME full install.
* Standardized use of underscores in option longnames. Affected scripts and options:
 * ``scripts/``
   * `start-numbering-at` is now `start_numbering_at`
 * ``scripts/``
   * `low_cut-off` is now `low_cut_off`
   * `high_cut-off` is now `high_cut_off`
 * ``scripts/``
   * `num-reps` is now `num_reps`
 * ``scripts/``
   * `num-reps` is now `num_reps`
 * ``scripts/``
   * `num-reps` is now `num_reps`
 * ``scripts/``
   * `no-legend` is now `no_legend`
 * ``scripts/``
   * `min-seq-length` is now `min_seq_length`
   * `max-seq-length` is now `max_seq_length`
   * `trim-seq-length` is now `trim_seq_length`
   * `min-qual-score` is now `min_qual_score`
   * `keep-primer` is now `keep_primer`
   * `keep-barcode` is now `keep_barcode`
   * `max-ambig` is now `max_ambig`
   * `max-homopolymer` is now `max_homopolymer`
   * `max-primer-mismatch` is now `max_primer_mismatch`
   * `barcode-type` is now `barcode_type`
   * `dir-prefix` is now `dir_prefix`
   * `max-barcode-errors` is now `max_barcode_errors`
   * `start-numbering-at` is now `start_numbering_at`
* Removed ``--output_dir`` optional option from ```` and replaced it with the required option ``--output_fp``.
* The parameters ``--uclust_min_consensus_fraction`` and ``--uclust_similarity`` in ``*_assign_taxonomy_*`` scripts have been changed to ``--min_consensus_fraction`` and ``--similarity`` since both of these parameters apply to the SortMeRNA taxon assigner as well.
* Several changes were made to ```` metric names:
  * ``ACE`` is now ``ace``
  * ``chao1_confidence`` is now ``chao1_ci``
  * Added ``observed_otus``, which is equivalent to ``observed_species`` but is generally a more accurate name. ``observed_species`` is retained for backward-compatibility.
* SortMeRNA 2.0, SUMACLUST 1.0.00, and swarm 1.2.19 are now installed automatically when QIIME is installed (e.g., via `pip install qiime`).

Bug fixes

* Relaxed sanity tests for `` --method adonis`` so that unique values are only checked for categories that are non-numeric (see [issue #1316](
* ```` now requires ``--tree_fp`` unless ``--nonphylogenetic_diversity`` is passed (see [#1671](
* Fixed bug in `` -m blast`` and ```` that prevented multiple instances of either to run at the same time (see [#1768](
* Fixed bug where ``--phred_offset`` in ```` was ignored (see [#1656](
* Spaces in taxa will not cause an error when using ``--assignment_method=mothur`` in ````.
* Fixed bug where long axis labels were cut off in heatmaps generated by ```` (see [#1571](
* Fixed bug where ``-S``/``--suppress_submit_jobs`` was being ignored by several of the parallel scripts (e.g. ````) (see [#1665](
* Fixed bug where ```` would create empty groups (see [#1627](
* ``qiime/workflow/`` no longer copies the permission bits of the reference file which caused a file permission failure in some cases.
* Fixed bug in ```` where ``--generate_per_sample_plots`` wasn't working (see [#1475](
* Fixed bug that resulted in samples being mislabeled in ```` when one of the following options was passed: ``--category``, ``--map_fname``, ``--sample_tree``, or ``--suppress_column_clustering``. This is discussed in [#1790](

Removal of outdated and unsupported functionality

* Removed ``-Y``/``--python_exe_fp`` and ``-N`` options from ```` script as these are not available in any of the other parallel QIIME scripts and we do not have good reason to support them (see QIIME 1.6.0 release notes below for more details).
* Removed ````. This code needs additional testing and documentation, and was not widely used. We plan to add this support back in the future, and progress on that can be followed on [#1499](
* ```` has been replaced with ````.
* Removed options ``-c``/``--ci_type``, ``-a``/``--alpha``, and ``-f``/``--f_ratio`` from ```` as these weren't being used by the script (i.e., supplying different values didn't change the computed CIs because the default were always used).
* Removed ``tax2tree`` as a method in ````.
* Fasttree v1.x is no longer supported by ```` (see [issue #1516](
* Removed ```` script (see [#1780](
* Removed ```` in favor of ```` (see discussion on [#1724](
* Removed ``-m``/``--include_html_counts`` option from the ```` script as the behavior was no longer useful or accurate.

Performance enhancements

* Changed default parameters for uclust-based OTU picking: ``max_accepts`` is now 1 (was 8), ``max_rejects`` is now 8 (was 500), ``stepwords`` is now 8 (was 20), and ``word_length`` is now 8 (was 12). These changes greatly reduce runtime, with minimal effect on the results. See Rideout et al., 2014 ([PeerJ pre-print]( for more details.
* Disabled the prefilter by default in ````. This change greatly reduces runtime, with minimal effect on the results. See Rideout et al., 2014 ([PeerJ pre-print]( for more details.
* The alpha diversity measures available in QIIME (e.g., ````) are now powered by [scikit-bio](, and several of these methods are now considerably faster! See the scikit-bio docs on [alpha diversity]( for more details on the methods.
* ANOSIM and PERMANOVA (available in ````) are now powered by [scikit-bio]( and are approximately 1000 times faster than previous implementations. These additionally now provide more useful information in the output file. See the scikit-bio docs for [ANOSIM]( and [PERMANOVA]( for more detail.
* Renamed ````'s BEST method to BIO-ENV to match the name used in R's vegan package (``vegan::bioenv``) and the name of the program in the original paper. Use `` --method bioenv`` instead of `` --method best``. The underlying implementation has also been rewritten and is considerably faster than before, and the output more closely matches the vegan package, as environmental variables are now scaled before computing Euclidean distances. See the scikit-bio docs for [BIO-ENV]( for more detail.
* The Mantel test (``--method mantel``) and Mantel correlogram (``--method mantel_corr)`` in ```` are considerably faster than previous implementations. See the scikit-bio docs for [Mantel]( for more detail.

QIIME 1.8.0 (11 Dec 2013)
* New script,, and associated tutorial added to support alternative illumina barcoding schemes.
* Added script, which supports joining of overlapping paired-end reads in fastq files. This wraps fastq-join and SeqPrep.
* script added-this script is intended to help process fastq data that is not in a compatible format with
* has been removed in favor of a new script called ```` which has significantly more functionality.
* has a new parameter, ``--genetic_code``, which can be used to specify which genetic code should be used when doing translated searches (from nucleotide sequences against a protein database). Genetic codes are specified numerically, corresponding to the genetic codes detailed on the [NCBI page here](
* has a new parameter, ``--recover_from_failure``, that allows the user to re-run on an existing output directory and will only re-run analyses that haven't already been run. This additionally allows the user to add additional categories to a previous run, which is very common and previously required a full re-run.
* Added new script, ````, which implements some of the interpolation and extrapolation richness estimators in Colwell et al. (2012), Journal of Plant Ecology. IMPORTANT: This script should be considered beta software; it is currently an experimental feature in QIIME.
* QIIME now depends on [qcli 0.1.0](, a stand-alone package which performs command line interface parsing and testing.
* has been removed in favor of qcli_make_rst.
* can now take more than two input coordinate matrices. When used this way, the first coordinate matrix will be treated as the reference, and the 2nd through nth will be compared against that reference. The output file names, which were all previously hard-coded, are now generated on the fly for clarity of the results.
* can now handle per-sample, non-barcoded fastq files. Some sequencing centers are now providing data in this way - if this becomes more common, we'll want to make this more convenient, but for now it's possible.
* Added a parallel merge OTUs method that will combine OTU tables in parallel where possible.
* Added to support paired difference (i.e., Pre/Post) testing as discussed in issue #1040.
* Added new taxonomic assignment method, ``qiime.assign_taxonomy.UclustConsensusTaxonAssigner``. This is accessible through `` -m uclust``, ````, ```` and ````. This is being tested as an alternative to QIIME's existing taxonomic assignment methods.
* Refactored,, and workflows to generate emperor PCoA plots instead of KiNG PCoA plots. QIIME now depends on Emperor 0.9.3. One interface change that will be noticeable to users is that the output PCoA plots from these workflows are no longer separated into "continuous" and "discrete" directories. Users can make these color choices from within emperor, so only one PCoA plot is necessary. This refactoring also involved script interface changes to, which no longer generates 2d plots (interested users can call directly - these won't be needed as often, since we no longer have a Java dependency) or distance histograms (these data are better accessed through, which is better written and tested, though users can still call directly). As a result, no longer takes the --suppress_2d_plots, --suppress_3d_plots, or --histogram_categories parameters, and now takes a new --suppress_emperor_plots parameter which can be used to disable PCoA plotting.
* Modified to generate box plots in addition to statistics, and added the ability to pass multiple categories (instead of just a single category) on the command line. Also fixed issue where options contain ``dest`` parameter, and therefore could have a different name then their longform parameter name. This involves several script interface changes: the --category option is now called --categories; script now takes --output_dir instead of --output_fp (because multiple files can be created, instead of just a single file); --alpha_diversity_filepath is now --alpha_diversity_fp; and --mapping_filepath is now --mapping_fp.
* Refactored to add options --generate_per_sample_plots  and --generate_average_tables. These are now suppressed by default to reduce run time and size of output.
* Refactored to add option --retain_intermediate_files. Rarefied BIOM tables and alpha diversity results for each rarefied BIOM table are now removed by default to reduce size of output.
* Update to rtax 0.984.
* Required PyNAST version is now 1.2.2.
* Updated default taxonomy assigner to be the new uclust-based consensus taxonomy assigner. This was shown to be more accurate and faster than the existing methods in Bokulich, Rideout et al. (submitted).
* Renamed to for clarity
* Change short option names in to be consistent with other scripts.
* Increased default rdp_max_memory from 1500M to 4000M as this was almost always needing to be increased when re-training on modern reference databases.
* Required biom-format version is now 1.3.1.
* and have been moved to the FastUnifrac repo (
* Required matplotlib version is now >= 1.1.0, <= 1.3.1.
* Required numpy version is now >= 1.5.1, <= 1.7.1.
* QIIME has been added to [PyPi]( and can be installed using ``pip``.

QIIME 1.7.0 (14 May 2013)
* Required biom-format version is now 1.1.2.
* has been replaced with This follows a re-factoring to support only "downstream" analyses (i.e., starting with a BIOM table). This makes the script more widely applicable as it's now general to any BIOM data and/or different OTU picking strategies.
* Added support for usearch v6.1 OTU picking and chimera checking. This is in addition to existing support for usearch v5.2.236.
* Added section on using usearch 6.1 chimera checking with ```` to "Chimera checking sequences with QIIME" tutorial.
* ```` output now includes average alpha diversity values as well as the comparison p and t vals.
* ```` has a new option ``--variable_size_distance_classes`` for running Mantel correlogram over distance classes that vary in size (i.e. width) but contain the same number of pairwise distances in each class.
* ``qiime.filter.sample_ids_from_category_state_coverage`` now supports splitting on a category.
* Modified script to use standard metadata mapping file with a column specified for fasta file names to make more consistent with other scripts.
* now makes better use of the BIOM Table API, addressing a performance issue when using CSMat as the sparse backend.
* Added, which is useful for plotting distances between "adjacent" sample ids in a list provided by the user. This is useful, for example, in plotting distances between adjacent temporal samples in a time series.
* Fixed a bug in related to biplot calculations. This bug would change the placement of taxonomic groups based on how many taxa were included in the biplot analysis. Examples and additional details can be found here: [#677](
* Major refactoring of workflow tests and organization of workflow code. The workflow library code and tests have now been split apart into separate files. This makes it a lot more manageable, which will support a more general refactoring of the workflow code in the future to make it easier to develop new workflows. The workflow tests have also been updated to use the new test data described in [#582](, which is now accessible through ``qiime.test. get_test_data()`` and ``qiime.test.get_test_data_fps()``. This provides improved testing of boundary cases in each workflow, as well as more consistent tests across the workflows.
* now supports an input directory of BIOM tables, and can write out either a single collated results file or an individual file for every input table in the directory. The -o output_fp is now a required parameter rather than an optional parameter.
* now has a -m/--mapping_fp option and writes output to a directory instead of a single file. -n/--num and -d/--dissim now accept a single number or comma-separated list of values.
* can now handle input directorys of otu tables, can write a single collated results file if the input directory is of rarefied otu tables, and the -o output fp option is now a required parameter.
* The qiime_test_data repository has been merged into the main qiime repository, which will facilitate development by not requiring users to time pull requests against two repositories. Users will no longer have to specify qiime_test_data_dir in their qiime_config files to include the script usage tests in runs of will now know how to find qiime_test_data, and will run all of the script usage tests by default.
* now outputs otu_table.biom in top-level output directory rather than nested in the otu picking output directory.
* has been renamed (issue [#708](
* has been renamed (issue [#708](
* has been renamed (issue [#708](
* now supports auto-sizing of distribution plots via --distribution_width (which is the new default) and better handles numeric label types with very large or small ranges (e.g. elevation) by scaling x-axis units to [1, (number of data points)]. --group_spacing has been removed in favor of the new auto-sizing feature.
* removed in favor of biom-format's
* Add SourceTracker tutorial, and changed QIIME to depend on SourceTracker 0.9.5 (which is modified to facilitate use with QIIME).
* Moran's I (in now supports identical samples (i.e. zeros in the distance matrix that aren't on the diagonal).
* now outputs taxa summary tables in both classic (TSV) and BIOM formats by default. This will allow taxa summary tables to be used with other QIIME scripts that expect BIOM files as input. This change is the first step towards adding full support for BIOM taxon tables in QIIME. also has two new options: --suppress_classic_table_output and  --supress_biom_table_output.
* and now explicitly state the alternative hypothesis used in the t-tests.
* now has a different option for providing a blast db (--blast_db). This implies that the current --refseqs_path should be used only for providing a fasta file of reference sequences. The --suppress_format_blastdb option has been removed since it is no longer needed.

QIIME 1.6.0 (18 Dec 2012)
* Added ```` to support filtering OTUs with (or without) specific taxonomy assignments from an OTU table.
* Added parameters to ```` to suppress taxonomy assignment (``--suppress_taxonomy_assignment``), and alignment and tree building steps (``--suppress_align_and_tree``). These are useful for cases where a taxonomy may not exist for the reference collection (not too common) or when the region doesn't work well for phylogenetic reconstruction (e.g., fungal ITS). Additionally fixed a bug where alternate ``assign_taxonomy`` parameters provided in the parameters file would be ignored when running in parallel.
* Detrending of quadratic curvature in ordination coordinates now a feature of QIIME. This approach was used in [Harris JK, et al. "Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat."](
* Supervised learning mislabeling output now includes binary "mislabeled" columns at 5%, 10%, ..., 95%, 99%.
* Added tutorial on Fungal ITS analysis.
* Added tutorial on predicting mislabeled samples.
* Modified the parameters (de novo chimera detection, reference chimera detection, and size filtering) for USEARCH options with ```` to ``suppress_X`` and ``False`` by default, rather than ``True`` and turned off by calling, to make them more intuitive to use and work better with the workflow scripts.
* Added a ``simpson_reciprocal`` measure of alpha diversity, which is ``1/D``, following the [definition here]( among other places. Note the measure ``reciprocal_simpson`` is ``1/simpson``, not ``1/D``. It was removed for clarity.
* Added new script, ````, which identifies the core OTUs (i.e., those defined in some user-defined percentage of the samples).
* Major refactoring of parallel QIIME. Repetitive code was consolidated into the ParallelWrapper class, which may ultimately move to PyCogent. The only script interface changes are that the ``-Y/--python_exe_fp``, ``-N (serial script filepath)``, and ``-P/--poller_fp`` parameters are no longer available to the user. These were very infrequently (if ever) modified from defaults, so it doesn't make sense to continue to support these. These changes will allow for easier development of new parallel wrappers and facilitate changes to the underlying parallel functionality.
* Added new script, ````, and supporting library and test code (``qiime/`` and ``tests/``) to allow for the comparison of taxa summary files, including sorting and filling, expected, and paired comparisons using pearson or spearman correlation. Added accompanying tutorial (``doc/tutorials/taxa_summary_comparison.rst``).
* New script for parallel trie otu picker.
* Made ``loaddata.r`` more robust when making mapping files, distance matrices, etc. compatible with each other. There were rare cases that caused some R functions (e.g. ``betadisper``) to fail if empty levels were left in the parsed mapping file.
* Fixed issue in ``ParallelWrapper`` class that could have caused a deadlock if run from within a subprocess with pipes.
* ```` and ```` can now perform Student's two-sample t-tests to determine whether a pair of boxplots/distributions are significantly different (using both parametric and nonparametric Monte Carlo-based tests of significance). These changes include three new options to the two scripts (``--tail_type``, ``--num_permutations``, and ``--suppress_significance_tests``), as well as a new function ``all_pairs_t_test`` in ``qiime.stats``. The accompanying tutorial has also been updated to cover the new statistical tests.
* Checks are now in place to prevent asymmetric and non-hollow distance matrices from being used in ````, ````, ````, ````, and ````. The relevant script help and underlying library code has been documented to warn against their use, and the symmetry checks can be easily disabled if performance becomes an issue in the future.
* ``qiime.util.DistanceMatrix`` has new method ``is_symmetric_and_hollow``.
* Added the new Illumina Overview Tutorial which was developed for the ISME 14 Bioinformatics Workshop and added the IPython notebook files that were used in the ISME 14 workshop under the new ``examples/ipynb`` directory. These can be used by changing to the ``ipynb`` directory and running ``ipython notebook`` on a system with IPython and the IPython Notebook dependencies installed. Also moved the ``qiime_tutorial`` directory to the new ``examples`` folder.
* Added support for translated database mapping through ```` and ```` and related library code, parallel code, etc. This is analogous to closed-reference OTU picking, but can translate queries so is useful for mapping metagenomic or metatranscriptomic data against databases of functional genes (e.g., KEGG). Currently BLAT and usearch are supported for translated searching.
* ``qiime.util.qiime_system_call`` now has an optional shell parameter that is passed through to ``subprocess.Popen``.
* Changed ```` script interface such that ``--method rda`` is no longer supported and must now be ``--method dbrda`` as the method we provide is db-RDA (capscale), not traditional RDA; added the ability to pass the number of permutations (``-n``) for PERMDISP and db-RDA (these were previously not supported); updated script documentation, statistical method descriptions, and accompanying tutorial to be of overall better quality and clarity; output filename when method is PERMDISP is now ``permdisp_results.txt`` instead of ``betadisper_results.txt``, which is consistent with the rest of the methods; significant refactor of underlying code to be better tested and maintained easier; added better error checking and handling for the types of categories that are accepted by the statistical methods (e.g. checking that categories are numeric if they need to be, making sure categories do not contain all unique values, or a single value); fixed output format for BEST method to be easier to read and consistent with the other methods; ``qiime.util.MetadataMap`` class has a few new utility methods to suppport some of these changes.
* ```` now supports both parametric and nonparametric two sample t-tests (nonparametric is the default) with the new optional options ``-t/--test_type`` and ``-n/--num_permutations``. Also fixed a bug that used the wrong degrees of freedom in the t-tests, yielding incorrect t statistics and p-values, and added correction for multiple comparisons.
* Removed tree method ``raxml`` from ````'s choices for ``-t/--tree_method``. Tree method ``raxml_v730`` should now be used instead. RAxML v703 is no longer supported.
* Minimum PyNAST version requirement upgraded to PyNAST 1.2.
* ````, ````, and ```` now correctly output TSV data files with ``.txt`` extension instead of ``.xls`` (this allows them to be opened easier in programs such as Excel).
* ```` has a new option ``--color_individual_within_by_field`` that allows the "individual within" boxplots to be optionally colored to indicate their membership in another mapping file field. A legend is also included.
* Added ``sample_ids_from_category_state_coverage`` function to ``qiime/`` to support filtering of samples based on a subject's category coverage. For example, this function is useful for filtering individuals out of a time series study that do not meet some sort of timepoint coverage criteria.
* ```` now supports assignment with tax2tree version 1.0 and mothur version 1.25.0.
* Added new script ```` and accompanying tutorial to allow exporting and downloading of mapping files stored as Google Spreadsheets.
* Fixed bug in ```` which would cause the script to hang if a relative path was passed for ``-o``.
* Added the [``qiime_test_data``]( repository which contains example input and output for most QIIME scripts. The individual script documentation was completely refactored so that usage examples correspond to the example input and output files. The *basic script testing* functionality was removed from ```` and replaced with more detailed testing of the scripts based on their usage examples.
* ```` was removed in favor of ```` (a ``biom-format`` project script). See the new [tutorial on adding metadata to BIOM files](
* Updated ``qiime.util.get_qiime_library_version`` to return git commit hash rather than svn revision number (as we're using git for revision control now).
* Added java version in output of ```` to assist with debugging.
* Changed ```` so ``-o`` specifies the filename of the figure, not the output directory anymore.
* Added new script ```` which adds alpha diversity data to a mapping file for incorporation in plots, etc.
* Moved the QIIME website files from ``Qiime/web`` to their own GitHub repository: [](
* Fixed bug in installation of QIIME Denoiser with
* ```` now produces mislabeling.txt and cv_probabilities.txt that look like QIIME mapping files, allowing them to be used for coloring points in PCoA plots, etc.
* Updated RDP Classifier training code to allow any number of ranks in training files, as long as number of ranks is uniform. This removes the need for special RDP training files in reference OTU collections.
* Added table density and metadata listings to ````.
* Updates to several dependencies. New dependencies (for those that changed in this release) are: Python 2.7.3; PyCogent 1.5.3; biom-format 1.1.1; PyNAST 1.2; usearch 5.2.236; rtax 0.983; AmpliconNoise 1.27; Greengenes OTUs 12_10; and RDP Classifier 2.2.

QIIME 1.5.0 (8 May 2012)
* OTU tables are now stored on disk in the BIOM file format (see The BIOM format webpage describes the motivation for the switch, but briefly it will support interoperability of related tools (e.g., QIIME/MG-RAST/mothur/VAMPS), and is a more efficient representation of data/metadata. The biom-format projects DenseTable and SparseTable objects are now used to represent OTU tables in memory. See the script in the biom-format project for converting between 'classic' and BIOM formatted OTU tables.
* Added a script, add_qiime_labels, that allows users to specify a directory of fasta files, along with a mapping file of SampleID<tab>fasta file name, and combines the fasta files into a single combined fasta file with QIIME compatible labels.  This is to handle situations where sequencing centers perform their own proprietary demultiplexing into separate fasta files per sample, instead of supplying raw data, but users would like to use QIIME to analyze their data.
* Added new script to perform significance testing of categories/sample grouping. Added accompanying tutorial and new RExecutor class to Methods supported by are Adonis, Anosim, BEST, Moran's I, MRPP, PERMANOVA, PERMDISP, and RDA. See doc/tutorials/category_comparison.rst for details.
* can now perform partial Mantel and Mantel correlogram tests in addition to the traditional Mantel test. Additionally, the script has several new options. Added new supporting tutorial and generic statistical method library code (doc/tutorials/distance_matrix_comparison.rst, qiime/, qiime/, and two new classes (DistanceMatrix and MetadataMap) to
* added a new option "-s" which by default only outputs the unscaled points, whereas user can choose to show scaled, unscaled or both.
* default parameters updated based on evaluation of parameter settings on real and mock community data sets. A manuscript describing these results is currently in preparation. Briefly, the -p/--min_per_read_length parameter was modified to take a fraction of the full read length that is acceptable as the minimum, rather than an absolute (integer) length. Additionally the --max_bad_run_length default was changed from 1 to 3.
* code was completely refactored to increase readability and ease of modification.  Now also creates html output to display locations of errors and warnings in the mapping file.
* Altered default value of min_length in and This was previously set to 150 based on 454 FLX data, but it is now computed as 75% of the median input sequence length. This will scale better across platforms and read length, and allow for more consistent handling in of data from different sources. The user can still pass --min_length with a specific value to override the default.
* Altered the way handles errors/warnings from the mapping file, and fixed a bug where suppression of warnings about variable length barcodes was not being properly passed.  Now warnings will not cause to halt execution, although more serious problems (errors) will.  These includes problems with headers, SampleIDs, and invalid characters in DNA sequence fields.
* Increased allowed ambiguous bases in default values from 0 to 6.  This is to accommodate the FLX+ long read technology which will often make ambiguous base calls but still have quality sequences following the ambiguous bases.  Also added an option to truncate at the first "N" character option (-x) to allow users to retain these sequences but remove ambiguous bases if desired.
* Updated to support merging of mapping files with overlapping sample ids.
* Added support for CASAVA 1.8.0 quality scores in This involved deprecating the --last_bad_quality_char parameter in favor of --phred_quality_threshold. The latter is now computed from the former on the basis of detecting which version of CASAVA is being used from the fastq headers (unfortunately they don't include this information in the file, but it is possible to detect).
* Added the possibility of printing the function of the curve that was fit to the points in
* Replaced with The interface was redesigned, and the script was renamed for clarity.
* Replaced with The interface was redesigned, and the script was renamed for clarity.
* Add new script to compute the coverage of a  sample (or its inverse - the conditional uncovered probability) in the script Current estimators include lladser_pe, lladser_ci, esty_ci and robbins.
* Updated usearch application wrapper, unit test, and documentation to handle usearch v5.2.32 as earlier version supported has bugs regarding consensus sequence generation (--consout parameter).
* Added support for the RTAX taxonomy assignment. RTAX is designed for assigning taxonomy to paired-end reads, but additionally works for single end reads. QIIME currently supports RTAX 0.981.
* Added the, a more efficient open reference OTU picking workflow script for processing very large Illumina (or other) data sets. This is being used to process the Earth Microbiome Project data, so is designed to scale to tens of HiSeq runs. A new tutorial has been added that describes this process (doc/tutorials/open_reference_illumina_processing.rst).
* Added new script to convert fasta/qual files to fastq.
* Added ability to output demultiplexed fastq from
* Added a new sort option to which is very useful for web-interface. By default, sorting is turned off.
* Added ability to output OTUs per sample instead of sequences per sample to
* Updates and expansions to existing tutorials, including the using AWS and procrustes analysis tutorials.
* Added to insert reads into an existing tree. This script wraps RAxML, ParsInsert, and PPlacer.
* Updated to handle look only at the first n bases of the barcode reads, where n is automatically determined as the length of the barcodes in the mapping file. This feature is only use if all of the barcodes are the same length. It allows qiime to easily handle ignoring of a 13th base call in the barcode files - this is a technical artifact that sometimes arises.
* Added new module that provides an API for running biogeographical statistical methods, as well as a framework for creating new method implementations in the future (this code was moved over from qiimeutils/microbiogeo). Also added two new classes to the util module (DistanceMatrix and MetadataMap) that are used by the stats module.
* Updated Mothur OTU picker support from 1.6.0 to the latest (1.25.0) version.
* Added to support parallel jobs on SGE queueing systems.
* Modified and to show SampleIDs with zero sequence count and to show the total sum of sequences written in the log file.

QIIME 1.4.0 (13 Dec 2011)
* Implemented usearch (ie OTUPIPE) as chimera detection/quality filtering/OTU picking in the module.
* All workflows now log the md5 sums of all input files (trac #92).
* Testing of QIIME with new dependency versions, updating of warnings and test failures (in No code changes were required to support new versions.
* can now handle gzipped input files.
* Addition of code and tutorial to support plotting of raw distance data in QIIME (scripts/, scripts/, qiime/, doc/tutorials/creating_distance_comparison_plots.rst).
* Updates to many scripts to support PyCogent custom option types (new_filepath, new_dirpath, etc.).
* Fixes to workflows to fail immediately on certain types of bad inputs (e.g., missing tree when building UniFrac plots) rather than failing only when the script reaches the relevant step in the workflow.
* Added ability to merge otu tables with overlapping sample ids (in Values are summed when an OTU shows up in the same sample in different OTU tables.
* Added a new script ( to filter samples directly from distance matrices.
* Added script Non-Metric Multidimensional Scaling (NMDS).
* Added in the calculation of standard error in rarefaction plots, since only standard deviation was calculated. Also added an optional option choice for this.
* Support for to allow for uclust_ref to be run in parallel with creation of new clusters.
* Added script which allows to create a distance matrix from a metadata column.
* assign_taxonomy_reference_seqs_fp and assign_taxonomy_id_to_taxonomy_fp were added to qiime_config, which allows users to set defaults for the dataset they'd like to perform taxonomy assignment against. This works for the serial and parallel versions of assign_taxonomy for both BLAST and RDP.
* Added in the possibility of calculating RMS vectors, using two methods: avg and trajectory, to assess power (movement) of the trajectories. Additionally this feature will return the significance of the difference of the trajectories using ANOVA.
* Added in the possibility of adding vectors or traces of individuals in space; this can be helpful in time series analysis.
* Added additional allowed characters to data fields in mapping files.  These include space and /:,; characters.  All characters allowed now are: alphanumeric, underscore, space, and +-%./:,; characters.
* now can keep duplicated rows in the resulting mapping files and can rename sample names (SampleID), both in the resulting mapping files and the otu tables, with other column of the mapping file; this can be helpful for Procrustes analysis.
* now lets you control colors and axis of the resulting plots, and ignore missing samples, this can be useful when samples are missing after rarefying.
* default num_dimensions for changed from all dimensions to 3 (trac ticket #119). This more closely corresponds with how we use this test (e.g., to determine if we would draw the same biological conclusions from two different methods of generating a PCoA plot). This was in response to our noticing that monte carlo p-values were lower than we would expect in controls.
* Removed the --suppress_distance_histograms option from in favor using the -c/--histogram_categories option to determine whether these will be generated. If the user passes -c, distance histograms are generated. If they do not, these are not generated.
* Added support for fastq files in
* Several new tutorials including retraining of the RDP classifier, working with Amazon Web Services, basic unix/linux commands, and others.
* Fixed bug in that would result in only a single input file per lane have it's data stored in the fastq.
* Fixed bug in filter_otu_table where sampleIDs would remain despite all OTU counts being zero.
* Fixed bug in serial that was causing uclust to be used rather than uclust_ref as the default method for otu picking.
* Added option to support reverse complements of golay barcodes in the mapping file.
* Modifed so distance histograms are only generated if the user specifies --histogram_categories on the command line. These are very slow to generate for all mapping categories, so it makes more sense for the user to turn on histogram plotting for the specific categories they're interested in.
* Added option, --reverse_primer_mismatches to to allow setting of distinct mismatches from forward primer.
* Added option (-e/--max_rare_depth) to the command line of This allows for a convenient way for users to specify the maximum rarefaction depth on the command line, and is useful for when it needs to be set to something other than the median rarefaction depth. Also added option to control minimum rarefaction depth from the command line.
* Added support for 5- and 10-fold and leave-one-out cross-validation to supervised learning.
* Added state string handling to for metadata-based fasta filtering.
* Added subsample_fasta module for randomly subsampling fasta files.
* Added script to split a post-split-lib fasta file into per-sample fasta files. This is useful for sharing Illumina data with collaborators or creating per-sample files for DB submission.
* Fixed bug where multiple_rarefactions_even_depth didn't work with --lineages_included.
* Modified so can be applied when the method is other than PyNAST. This previously wasn't possible because we only filtered with the lanemask, but we now allow entropy filtering, so this is relevant.
* Fixed two serious bugs in related to p-value calculations (both Monte Carlo and parametric p-values were affected).
* Removed several obsolete scripts ( and several denoiser-related scripts).
* Added muscle_max_memory option to align_seqs script.
* Changed default num_dimensions to 3 in This more closely corresponds with how we use this test (e.g., to determine if we would draw the same biological conclusions from two different methods of generating a PCoA plot). This was in response to our noticing that monte carlo p-values were lower than we would expect in controls.

QIIME 1.3.0 (29 June 2011)
* uclust and uclust_ref OTU pickers now incorporate a pre-filtering step where identical sequences are collapsed before calling uclust and then expanded after calling uclust. This gives a big speed improvement (5-20x) on reasonably sized input sets (>200k sequences) with no effect on the resulting OTUs. This is now the default behavior for, and can be disabled by passing --suppress_uclust_prefilter_exact_match to
* Added ability to pass a file to that contains a sorted list of sample ids, and use that information rather than the mapping file for sorting the OTU table. This allows users to, e.g., pass sorted mapping files as input.
* Added script and workflow function. This plugs together many components of QIIME (split libraries,,, into a single command and parameters file.
* Added script ( which will create taxon-specific OTU tables from a master OTU table for taxon-specific analyses of alpha/beta diversity, etc.
* Changed default behavior of Now lineage information is included by default, but can be turned off with --suppress_include_lineages
* Added script ( for computing mantel correlations between a set of distance matrices.
* Interface changes to This allows the user to pass the output file name, rather than a directory where the output file should be written.
* Parameter -r reassignment in Now -r is used for reference_seqs_fp as before was for rdp_classifier_fp.
* Added script to expand clusters to fasta representing all sequences. This allows denoiser results to be passed directly to the OTU pickers (and OTU picking workflows) which should greatly reduce the complexity of denoiser runs. The "Denoising 454 Data" tutorial has been updated to reflect how the pipeline should now be run. The denoising functionality was removed from the workflow script as that could only be used in very special circumstances - this allows us to focus our attention on supporting the new pipeline described in the updated tutorial.
* Reorganized output from to get rid of the confusing output directory structure.
* Added script to plot semivariograms using two distance matrices. This script also plots a fitting curve of the data values.
* Changed beta diversity scripts to do unweighted_unifrac,weighted_unifrac by default.
* Changed output of to a directory instead of filepath. This allows for multiple levels to be processed simultaneously.
* The now contains some additional functionality -- 2d plots and distance histograms. It has therefore been renamed Any of the plots can be disabled by passing the options  --suppress_distance_histograms, --suppress_2d_plots, and --suppress_3d_plots.
* Updated required version of FastTree to 2.1.3 as this version contains some bug fixes over version 2.1.0.
* Modified so default is to include lineages (previously did not include these by default).
* Added script which splits a single OTU table into several OTU tables based on the values in a specified column of the mapping file. This is useful, for example, when a single OTU table is generated that covers multiple studies.
* Fixed bug in mouseovers in taxa area and bar charts. These were misaligned when a lot of samples were included.
* Added support for RDP classifier 2.2. Versions 2.0 and 2.2 are both supported.
* Added support for AmpliconNoise with the script.
* Added new page to the documentation to cover upgrades between versions of QIIME.
* Updated the output filepaths and HTML layout to be more consistent with other plotting scripts.
* Added a new taxonomy summary workflow (
* Modified workflow scripts so stdout and stderr are written to the log file. This is very useful for debugging.
* Added new script ( to simulate samples using a phylogentic tree.
* Complete overhaul of Illumina data processing code. QIIME now treats fastq format as the default for Illumina data, and various other formats can be converted to fastq using and The "Processing Illumina Data tutorial" has also been completely overhauled and describes these changes. The primary script for demultiplexing Illumina data is now
* Dropped support for PyroNoise in favor of AmpliconNoise (the successor to PyroNoise) and the QIIME denoiser.
* Added script to simply the integration of denoiser results into the QIIME pipeline. See the "Denoising 454 Data" tutorial, which has been overhauled in this release. To reduce the possible pathways through QIIME with denoising, support for denoising was removed from in favor of working with the pipeline presented in the tutorial.
* Changed default behavior of so unassigned reads are not stored by default. There is now a --retain_unassigned_reads option to achieve the previous behavior.
* Many clean-ups to the script documentations through-out QIIME.
* Adding scripts to plot semivariograms.
* Modified all workflow scripts so parameter files are now optional. This will simplify working with 'default' analyses in these scripts.
* Added more thorough support for floating point values in OTU tables. This was previously supported only in specific cases.
* Added support for users to pass jobs_to_start on the command line for all of the workflow scripts. This overrides this value in the parameters file and qiime_config, and is a more convenient way of controlling this.
* Added entropy filtering option to This can be useful for position-filtering de novo alignments, or other alignments where no lanemask is available.
* Added new script ( which will count the number of sequences in one or more fasta file, as well as the mean/stddev sequence lengths, and print the results to stdout or file.
* Added the workflow script, which includes summarizing the OTU table by category.
* Overhauled the QIIME overview tutorial.
* Added new script ( which can be used for running parallel QIIME on clusters using torque for the queueing system. A new qiime_config value, torque_queue, can be specified to define the default queue.
* Integrated the QIIME Denoiser (Reeder and Knight, 2011) into Qiime.
* Added script ( for comparing rarefied alpha diversities across different mapping file categories.
* Fixed bug in where reverse strand matching did not work for uclust/uclust_ref.
* Modified location where temp files are written for more consistency through-out QIIME. Temp files are now written the temp_dir (from qiime_config) or /tmp/ if temp_dir is not defined. There may still be a few temp files being written to other locations, but the goal is that all will write to the same user-defined (or default) directory.
* Added script which splits a single OTU table into several OTU tables based on the values in a specified column of the mapping file. This is useful, for example, when a single OTU table is generated that covers multiple studies.
* Added script ( that makes TopiaryExplorer project file (.tep) from an otu table, sample metadata table and tree file.
* Removed the rdp_classifier_fp from qiime_config. This was used inconsistently through-out QIIME, so was somewhat buggy, and with the switch to RDP 2.2 in QIIME 1.3.0 I think it will save a lot of support headaches to just get rid of it.
* Added tutorial for processing 18S data, along with a small 3 domain sample sequence file in the qiime_tutorial/18S_tutorial_files/ folder.
* Added script, which functions similarly to Moved some functions from to that were generally useful.

QIIME 1.2.1 (22 Feb 2011)
* Added script which takes a post-split-libraries fasta file and submits it to the MG-RAST database.
* Added script which allows for sorting samples in an OTU table based on their associated values in a mapping file.
* Remove DOTUR OTU picker. This was requested by Pat Schloss as Mothur has replaced DOTUR.
* Removed support of SRA submission and processing scripts along with related documentation and tutorial. This included the following scripts: make_sra_submission, sra_spreadsheets_to_map_files, process_sra_submission (starting revision 1786).
* Added script.
* Added OTU gain as a new beta diversity metric to compute non-phylogenetic gain (G).
* Added features to split_libraries to allow truncation or removal of sequences with quality score windows, and increased information deposited in log file about sliding window quality score tests.  Added unit test for quality score truncation/removal.
* Added reference-based OTU picking workflow script. This can be applied for database OTU picking, as well as for applying Shotgun UniFrac (Caporaso et al. 2011, PLoS One, accepted).
* Added a new list of distinct colors to the module
* Added Area and Bar taxa summary plots to a new script  This script allows for writing of Pie Charts as well, thereby deprecating the script.
* Added support for output of biplot coords to make_3d_plots script (SF feature req. 3124713).
* --stable_sort option enabled by default for uclust OTU pickers.
* Changed defaults for uclust and uclust_ref OTU pickers. The new parameters make both OTU pickers about 2-3x slower, but the resulting clusters are significantly better in terms of making the best choice of OTU for a given sequence, and ensuring that cluster seeds are less than 97% identical to one another. The default rep seq picking method was also changed to "first" from "most_abundant" which ensures that the seed sequence is chosen as the representative for a cluster. Abundance is instead taken into account at the otu picking stage (as it has been for a while) by pre-sorting the sequences by abundance so most abundant sequences are more likely to be seeds. In practice, with presorting by abundance, the same sequence is usually chosen as the representative when passing first or most_abundant as the OTU picking method.
* Added support for generating inVUE plots in
* Changed tree type default for upgma comparisons, to consensus tree rather than the upgma tree based on the full otu table.
* Disabled the check that jobs_to_start > 1 in a user's qiime_config before allowing them to start parallel jobs. This is inconvenient in several places (e.g., EC2 images when used with n3phele), and after some discussion we decided that it should be up to the user to have understood how parallel qiime should be configured before using it.
* Added ability to pool primers for mapping files passed to check_id_map and  Primers are separated by commas, and autodetected.
* Added sort_otu_table.txt for sorting the sample IDs in an OTU table based on their value in a mapping file.
* Changed the method for p-value calculation in Procrustes analysis Monte Carlo in response to SF bug # 3189200.

QIIME 1.2.0 (10 Nov 2010)
* When computing jackknife support for sample clustering (e.g.: UPGMA sample trees), Qiime can now compute a consensus tree from the jackknife replicates, in addition to the existing functionality of using the full dataset as the master tree, and annotating that tree with jackknife support values. See --master_tree and .
* Added the ability to write out the flowgram file in, ability to define an output directory and convert Titanium reads to FLX length.
* SRA submission protocol updated to perform human screening with uclust_ref against 16S reference sequences, rather than cdhit/blast against reference sequences. This can be a lot faster, and reduces the complexity of the code by requiring users to have uclust installed for the human screen rather than cdhit and blast.
* Updated SRA protocol to allow users to skip the human screening step as this takes about 2/3 or more of the total analysis time, and is not relevant for non-human-derived samples (e.g., soil samples).
* Added ability to pass --max_accepts, --max_rejects, and --stable_sort through the uclust otu pickers.
* Added a -r parameters to to allow users to pass "preferred" representative sequences in a fasta file. This is useful, for example, if users have picked OTUs with uclust_ref, and would like to use the reference sequences as their representatives, rather than sequences from their sequencing run.
* Renamed Qiime/scripts/ to Qiime/scripts/ to reflect the addition of generating jackknifed 2d and 3d plots to this workflow script.
* Updated,, and to use the jobs_to_start value for better control over the number of parallel runs.
* uclust_ref otu picker now outputs an additional failures file listing the sequences which failed to cluster if the user passed --suppress_new_clusters. This is done for ease of parsing in downstream applications which want to do something special with these sequences. The failures list is no longer written to the log file (although the failures count is still written to the log file).
* Added the script which allows users to build a fasta file from an existing fasta where specified sequences are either included or excluded from the new file. The sequences to keep or exclude can be specified by a variety of different inputs, for example as a list of sequence identifiers in a text file.
* Added parallel version of uclust_ref OTU picker.
* Added negative screen option to -- this allows users to screen by discarding all sequences that match a reference set, while the (default) positive screen allows users to screen by retaining only sequences that match a reference set.
* Added options to to enable the detection and removal of reverse primers from input sequences, and an option to record a filtered quality score output file that matches the bases found in the output seqs.fna file.
* Added the script that allows users to create an OTU table simile from a Terminal restriction fragment length polymorphism (T-RFLP) text file.
* Added min_aligned_length parameter to the BLAST OTU picker. By default, BLAST alignments now must cover at least 50% of the input sequence for OTU assignment to occur.
* Changed default randomization strategy in Procrustes monte carlo from shuffling within coordinate vectors to shuffing the labels on the vectors themselves. This doesn't appear to affect clearly significant cases at all, but is more conservative and therefore favors non-significance of results in borderline cases.
* Added ability to run beta diversity calculations in parallel at the single OTU table level to improve performance when computing diversity on very large collections of samples. This functionality is now hooked up to the workflow script, and includes the new -r parameter to which allows users to specify samples to compute diversity vectors for (rather than requiring that the full all-against-all diversity matrix is created).
* uclust-based analyses now retain the .uc files as these contain a lot of useful information that was previously being discarded.
* Improved handling of blank lines in parse_otu_table -- these are now ignored. Other improvements were made to the parse_otu_table format to better support these files coming from sources other than QIIME (such as MG-RAST).
* Allow the -R option to be passed to ChimeraSlayer. Closes feature request 3007445.
* Added capability for pairwise sample/sample, monte carlo significance tests. These are frequently done via the unifrac web interface. Users hitting max size limitations on the web can now thrash their own hardware.
* Fixed a bug in make_rarefaction_plots where the table below the plots had column labels sorted by natsort, while the values in the table were sorted arbitrarily by dict keys. The plots themselves were fine.
* Added a Procrustes analysis/plotting tutorial.
* Added code to exclude OTU ids from an OTU table when building the OTU table. This allows users to discard OTUs that were identified as chimeric. Accessible by passing --exclude_otus_fp to
* Modified to no longer require the ref db in unaligned format when using chimeraSlayer.
* Added a tutorial document on applying chimera checking in QIIME.
* Added ability to pass -F T/F to parallel_blast to allow disabling of the low-complexity filtering in BLAST.
* Added new script ( for computing shared OTUs between pairs of samples. Batch mode can be used in combination with to calculate stats for a set or rarefied OTU tables.
* Added min_aligned_percent parameter to BLAST OTU picker workflow, with default set at 50%. This will now require that an alignment must cover at least 50% of a sequence OTU assignment to occur.
* Add script to draw rank abundance graphs (
* Modified interface of make_distance_histograms so --html_output is now the default. A new parameter, --suppress_html_output, was added to produce the old behavior.
* Added script ( to plot quality score by position given a .qual file. This is useful with another new script ( to truncate fasta/qual files at the point where quality begins to decrease, and has been useful in controlling for quality issues on 454 Ti runs.
* Added binary SFF parsing module from PyCogent, removed sfftools dependency from workflow test, process_sff, and other areas of QIIME.
* Added ACE calculation to
* Updated documentation on file formats used by Qiime.
* Added more extensive error checking in parse_mapping_file to handle some cryptic error messages that were arising from scripts that were passed bad mapping files.
* Added capability to perform supervised classification of metadata categories using the Random Forests classifier. Outputs include a ranking of OTUs by discriminatory power, and the estimated probability of each metadata category for each sample. The latter may be useful for detecting potentially mislabeled samples.

QIIME 1.1.0 (14 May 2010)
* Additional field added to BLAST assign taxonomy output to indicate the best BLAST hit of the query sequence -- this is in response to Sourceforge feature request 2988407.
* Added presorting by abundance to uclust OTU picker. The idea here is that sequences which are more abundant are better representatives when clustering, so they should come first in the file. Also added ability to pass the optimal flag to uclust, which should also improve uclust-picked OTUs, which comes with a performance hit.
* Added Confidence interval display (jackknifed pcoa) in make_2d_plots and make_3d_plots. After performing multiple_rarefactions, beta_diversity and principal_coordinates on an OTU table, the user can supply the resulting directory to both of these scripts.  Currently the user has the option of performing InterQuartile Range (IQR) or standard-deviation (sdev) on the principal coordinate files and ellipses are drawn around each point to represent the confidence interval in each P.C.  Along with this option, the user can manipulate the opacity of the ellipses as well.
* Updated the display for rarefaction plots, so the legend does not overlap with the plots and fixed the display of the rarefaction average table in the webpage.  Now the user can switch between plots with different metrics and categories by using the drop down menus.  The user can also display the samples that contribute to the average for that group.  Below the plots, a table is displayed to show the rarefaction average data with all the distance metric values.
* Merged the make_rarefaction_averages into the make_rarefaction_plots script.  Also removed the inputs (--rarefaction_ave and --ymax) options, since they are determined by the script.  Also, restructured the output directory format and combined all metric data into one html.
* Added the uclust_ref OTU picker, which uses uclust to pick OTUs against a reference collection. Sequences which are within the similarity threshold to a reference sequnece will cluster to an OTU defined by that reference sequence, and sequences which are outside of the similarity threshold to any reference sequence will form new OTUs.
* The interface for has changed.  -M and -W options are now lowercase to avoid conflicts with parallel scripts.  Users can avoid formatting the database by passing --no_format_db.  By default the files created by formatdb are now cleaned up. Users can choose not to  clean up these files  using the --no_clean option.  Output file extensions have changed from ".excluded" to ".matching" and from ".screened" to ".non-matching" to be clear regardless of whether the sequences matching the database, or not matching the database, are to be excluded. A check was added for user-supplied BLAST databases in when run with --no_format_db: if the required files do not exist a parser error is thrown
* Added ability to chimera check sequences with ChimeraSlayer. See for details.
* Added workflow script for second-stage SRA submission, The SRA submission tutorial has been extensively updated to reflect the use of this new script.
* Added the ability to supply a tree and sort the heatmap based on the supplied tree.
* Added the ability to handle variable length barcodes, variable length primers, and no primers with Error-correction is not supported for barcode types other than golay_12 and hamming_8. also now throws an error if the barcode length passed on the commands line does not match the barcode length in the mapping file.
* Updated the script to print useful debugging information about the QIIME environment.
* Added high-level logging functionality to the workflow scripts.
* Added RUN_ALIAS field to SRA experiment.txt spreadsheet in make_sra_submission.xml.

QIIME 1.0.0 - (8 Apr 2010)
* uclust made default OTU picker (instead of cdhit).
* uclust made default pairwise aligner for PyNAST (instead of BLAST).
* Minimum PyNAST version requirement upgraded to PyNAST 1.1.
* Minimum PyCogent version requirement upgraded to PyCogent 1.4.1.
* tree_compare now can compare trees where some tips aren't present in all trees.
* --small_included option removed from rarefaction scripts.
* Added "remove outliers" functionality to  After removing lanemasked columns and gap columns, -r will remove outlying sequences, preventing odd spikes in phylo trees when some seqs are poorly aligned.
* Absent samples are now included in the output of unifrac like metrics - 0 dist between two samples that aren't there, 1 dist between an absent and a present sample.
* make phylogeny now does good midpoint rooting (still off by default).
* Consolidated parsing functionality to qiime.parse.
* Removed dependence on several qiime_config values - users should run Qiime/scripts/ -t to get information on parameter settings which are outdated.
* Added an example 'cluster_jobs' -- -- script which will give users in multi-core or multi-proc environments very easy access to parallel QIIME. This also adds parallel support to the QIIME virtual box.
* Modified the default value of jobs_to_start to be 1 -- because of the addition of the example cluster_jobs script, the default value of 24 no longer makes sense (if it ever really did...). Because the new script is built for multi-core/multi-proc environments, 24 is too high for most cases. Users will need to modify this value from 1 (corresponding to no parallelization) to a value that makes sense for their environment (e.g., 2 for dual core, or 24 to get the previous default).
* Added colors module and tests to consolidate and standardize coloring code in QIIME - also updated the graphics scripts to use the colors module.
* Added ability for user to specify the background colors of plots in prefs files or on the command line.
* Tweaked SRA submission routines in accordance with accepted format from JCVI's
survey of multiple body sites.
* Fixed SF bug #2971581, which was an issue with the path to qiime's scripts directory not being determined correctly when qiime was installed using qiime_config now contains a key (empty by defualt) for the qiime_scripts_dir. If this is not specified by the user, it is determined from the qiime project dir.
* Renamed scripts/ as scripts/ to reflect that the prefs files are now used by other scripts.
* Changed behavior of color-by option to make_3d_plots, make_2d_plots, and make_rarefaction_plots, so if no -b option or prefs files is provided, scripts default to coloring by all values. Consequently, mapping files are also now required for these scripts.
* Added a script to handle processing of Illumina GAIIx data.
* Added an additional rarefaction script for clarity. There are now 3 scripts to handle rarefaction: single_rarefaction takes one input otu table into one output table, allows manual naming,  multiple_rarefactions makes auto-named rarefied otu tables at a range of depths, and makes auto-named tables all at the same depth.
* Added workflow unit tests (with timeout functionality).
* Added default alpha and beta diversity metrics to qiime_parameters.txt.
* Integrated Denoiser (Jens Reeder's 454 denoiser) wrappers, and tied this into the workflow scripts.
* Added biplot functionality.  make_3d_plots now takes the -t option (off by default) to include taxa on the pcoa plot.
* Updated the QIIME tutorial to use the workflow scripts where possible. Additionally added the tutorial data set in the svn repository.
* Reorganization and expansion of the documentation through-out.
* Added sanity checks to This will now allow users to evaluate their environment, and should help with debugging.
* Added new field to qiime_config (temp_dir) which will be used to specify where temp files should be written. Currently this is only used by the workflow tests, and is intended to allow users to specify something other than /tmp for cases when /tmp is not shared between all nodes that might be working on a job. This will eventually be used for all temp dir creation.
* Added ability to make summary plots for a directory of coordinate files in make_3d_plots and make_2d_plots. The summary plot adds ellipsoidal confidence intervals around each point in the plot.

QIIME 0.92 - (3 Mar 2010)
* Removed outdated documentation PDFs, along with references to those PDFs in the README and INSTALL documents.

QIIME 0.91 - (3 Mar 2010)

* Addition of a uclust-based OTU picker.
* Transfer of all command line interfaces from Qiime/qiime to Qiime/scripts -- this was an important change as it allowed us to get away from the previously one-to-one relationship between files in our library code (in Qiime/qiime) and the command line interfaces.
* Standardized command line interfaces for all code in Qiime/scripts by using a new function, Qiime.qiime.util.parse_command_line_parameters to handle the command line interfaces.
* Moved to Sphinx for documentation, and developed a framework for extracting script documentation directly from the scripts to populate the web documentation.
* Bug fixes through-out the code base, including but not limited to fixes for Sourceforge tickets: 2957503, 2953765, 2945548, 2942443, 2941925, 2941926, 2941717, 2941396, 2939588, 2939575, 2935939.
* Updated the script to perform a minimal test of the scripts (getting help text works as expected), and to alert users if unit tests may be failing due to missing external applications, in which case they may not be critical.
* Created a directory for pycogent_backports, where we can temporarily store new code that has been added to PyCogent, but which has not been added to a PyCogent release yet. This will allow us to keep QIIME's dependencies on the latest PyCogent version despite rapid and frequently related changes in both packages.
* Added code for performing Procrustes analyses of coordinate matrices, and graphing the results of those analyses in 3d plots (see and
* Performance enhancements related to golay barcode decoding.
* Added to help with installation of QIIME - this will put the library code in site-packages, and the scripts in /usr/local/bin (both locations can be changed via command line options to
* Created a support_files directory to hold jar, js, png, and other required files.
* Added Pearson correlation to list of options in
* Workflow scripts added for running large repetitive processes with a single command rather than multiple commands -- in scripts, see,,,

QIIME 0.9 - (25 Jan 2010)

* Initial release