File: iqtree-manual-1.0.tex

package info (click to toggle)
iqtree 1.5.3%2Bdfsg-2
links: PTS, VCS
area: main
in suites: stretch
size: 9,780 kB
ctags: 11,529
sloc: cpp: 96,162; ansic: 59,874; python: 242; sh: 189; makefile: 45
file content (993 lines) | stat: -rw-r--r-- 44,088 bytes
parent folder | download | duplicates (5)

\documentclass[a4paper,11pt]{article}
%\documentclass{article}

  \usepackage{graphicx}
  \usepackage{color}
  \usepackage[round]{natbib}
  %\usepackage{url}
  \usepackage{hyperref}

  %\newcommand{\xxx}{\rule{10mm}{1ex}}
  %\hyphenation{IN-FILE-NAME PUZZLE}

  %\sloppy

\hoffset        -1in %% Initialization of documents is with a horizontal and  
\voffset        -1in %% a vertical offset of one inch. %%
%\raggedright %% Prevents horizontal block format. %%
\setlength{\parindent}{0cm} %% Each paragraph is indented by 1 cm. %%
\setlength{\parskip}{0.3cm} %% Each paragraph is indented by 1 cm. %%
\setlength{\oddsidemargin}{1.1in} %% Defines the left side margin of a document. %%
\setlength{\evensidemargin}{1.1in} %% Defines the right side margin of a document. %% 
\setlength{\topmargin}{1mm} %% Space between top of page and header. %%
\setlength{\headheight}{30mm} %% Height of header. %%
\setlength{\textwidth}{154mm} %% Width of text. %%
\setlength{\textheight}{215mm} %% Height of text. %%

\newcommand{\iqtree}{$\mathcal{IQ-TREE}$}

\begin{document}
\begin{titlepage}

\noindent
\hfill
\begin{center}
\begin{LARGE}\textbf{IQ-TREE version 1.0 (July 2014)\\[2ex]Fast phylogenetic inference and ultrafast bootstrap analysis by maximum likelihood.}
\end{LARGE}
\end{center}
%\hfill~
\vfill

\begin{center}
\begin{LARGE}User Manual and Tutorial
\end{LARGE}


\vfill

\begin{tabular}{ll}
%\small Copyright (C) 2012-2013 by & \small Bui Quang Minh, Lam-Tung Nguyen, Heiko A. Schmidt, \\ 
%								& and \small Arndt von Haeseler \\
\end{tabular}
\end{center}

\begin{LARGE}Please read carefully before using IQ-TREE the first time!
\end{LARGE}

\vfill

\begin{description}
\item[Project managers:] ~\\
Bui Quang Minh - \texttt{minh.bui(at)mfpl.ac.at}

Arndt von Haeseler - \texttt{arndt.von.haeseler(at)mfpl.ac.at}

\item [Core developers:] ~\\
Lam-Tung Nguyen - \texttt{tung.nguyen(at)mfpl.ac.at}

Olga Chernomor - \texttt{olga.chernomor(at)mfpl.ac.at}

Diep Thi Hoang - \texttt{diep.thi.hoang(at)gmail.com}

\item [Support:] ~\\
Heiko A. Schmidt - \texttt{heiko.schmidt(at)mfpl.ac.at}

\item [Contact address:] ~\\
Center for Integrative Bioinformatics Vienna (CIBIV)\\
Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna\\
   Dr. Bohr-Gasse 9, A-1030 Vienna, Austria\\

\end{description}



\vfill
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\textbf{License Agreement}
\label{Legal Stuff}

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or   
(at your option) any later version. 

This program is distributed in the hope that it will be useful, but    
WITHOUT ANY WARRANTY; without even the implied warranty of             
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU       
General Public License for more details.                               


\vfill

\end{titlepage}


\tableofcontents
\clearpage

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
\label{introduction}


IQ-TREE is an efficient program for reconstructing large maximum likelihood trees and
 assessing branch supports with the ultrafast bootstrap approximation.
 IQ-TREE extends the IQPNNI algorithm with many enhancements.
IQ-TREE is open-source and available free of charge from

\url{http://www.cibiv.at/software/iqtree/}

IQ-TREE has been tested on Unix, Mac OS X, and Windows.
The code of IQ-TREE has been written in standard C/C++, which is possibly 
compilable on other platforms.
Please read the \emph{Installation} section \ref{Installation} for more 
details.
We suggest that this documentation should be read before using IQ-TREE
the first time!

For impatient users we established a very user-friendly web server:

\url{http://iqtree.cibiv.univie.ac.at}

Its intuitive web interface allows users to perform online tree reconstruction within a few clicks.
Note that this online service only allows max. 12 CPU hours and 1 GB memory per job.
In case your job exceeds these limits, you can copy and paste the command-line displayed to 
run the analysis at your local machine.

To cite IQ-TREE please use the following paper:

\textbf{Bui Quang Minh, Minh Anh Thi Nguyen, and Arndt von Haeseler} (2013) Ultrafast approximation for phylogenetic bootstrap. \emph{Mol. Biol. Evol.}, 30:1188-1195.


%A manuscript was submitted:

%\textbf{Lam-Tung Nguyen, Heiko A. Schmidt, Arndt von Haeseler, and Bui Quang Minh} (2014) IQ-TREE: A fast and
%effective stochastic algorithm for estimating maximum likelihood phylogenies.
    
Further readings on the methods developed:

\begin{itemize}
\item \textbf{Heiko A. Schmidt and Arndt von Haeseler} (2009) Phylogenetic Inference Using Maximum Likelihood Methods. In P. Lemey, M. Salemi, A.M. Vandamme (eds.)\emph{The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing.}, 2nd Edition, 181-209, Cambridge University Press, Cambridge.

\item \textbf{Bui Quang Minh, Le Sy Vinh, Arndt von Haeseler and Heiko A. Schmidt} (2005) pIQPNNI: Parallel reconstruction of large maximum likelihood phylogenies. \emph{Bioinformatics}, 21(19):3794-6. 

\item \textbf{Le Sy Vinh and Arndt von Haeseler} (2004) IQPNNI: Moving fast through tree space and stopping in time. \emph{Mol. Biol. Evol.}, 21(8):1565-1571.

\end{itemize}    
    
%\item Tung-Lam Nguyen, Heiko A. Schmidt, Bui Quang Minh, and Arndt von Haeseler (2012) IQ-TREE: Efficient algorithm
%for phylogenetic inference by maximum likelihood and important quartet puzzling. \emph{In prep.}

If you encounter bugs please send the \texttt{.log} file of the run and possibly the alignment to: \texttt{tung.nguyen(AT)univie.ac.at}  and \texttt{minh.bui(AT)univie.ac.at}.

%============================================%
\subsection{What's new in version 1.0?}
\label{whatnews}

Version 1.0 is the major release of the IQ-TREE software. We are happy to announce the following new features:
\begin{itemize}
\item Integration of the phylogenetic likelihood library \citep[PLL; ][]{tomas2014} for fast likelihood computation. This is enabled via \texttt{-pll} option and gives a speedup of 2X to 8X.
\item A novel fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. It outperforms RAxML and PhyML in terms of log-likelihoods while requiring similar amount of computing time. A manuscript describing the new method was submitted. See Section \ref{sec.new-tree-search} for more details.
\item  Codon models: GY (Goldman \& Yang 1994), MG (Muse \& Gaut 1994), and ECM (Kosiol et al. 2007)
\item Morphological models: MK and ORDERED (Lewis 2001)
\item Ascertainment bias correction model (+ASC) for e.g., morphological or SNP data (Lewis 2001)
\item Nearest neighbor interchange with five (instead of one) branch optimization (\texttt{-nni5}) is now the default option because of its higher
accuracy
\item SH-aLRT branch test also works now for partition models.
\end{itemize}

%============================================%
\subsection{Key features}
\label{features}

IQ-TREE provides a lot of options for phylogenetic reconstruction. The main features include:

\begin{itemize}
\item Reconstruction of the maximum likelihood tree from sequence alignments \citep{vinh2004,minh2005a}.
\item Ultrafast bootstrap approximation for assessing branch supports \citep{minh2013}.
\item Various substitution models for binary, nucleotide, amino-acid with/without rate heterogeneity.
\item Partition models for phylogenomic data
\item Automatic selection of best-fit models similarly to ModelTest \citep{posada1998}.
\item Standard non-parametric bootstrap \citep{felsenstein1985}.
\item Single branch tests \citep[LBP, SH-aLRT; ][]{adachi1996b,guindon2010}.
\item Test of model homogeneity assumption along the tree \citep{weiss2003}.
\item Site-specific rate model \citep{meyer2003}.
\item Fast consensus tree reconstruction, Robinson-Foulds distance computation.
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Installation}
\label{Installation}

  See below for information how to install/build the different
  versions of the IQ-TREE software. Executable versions of the sequential,
  that is, non-parallel program are intended for a number of operating 
  systems. 

\subsection{Binary release}

\begin{enumerate}
\item Download the executable version of IQ-TREE
       for your operating system if it is available (\texttt{iqtree-XXX-OS.tar.gz}
       or \texttt{iqtree-XXX-OS.zip}, where \texttt{XXX} is the current version number and 
       OS the operating system) from\\
       \url{http://www.cibiv.at/software/iqtree}
\item Extract the files (e.g., with \texttt{tar xvzf iqtree-XXX-OS.tar.gz} under Unix).
       This should create a directory \texttt{iqtree-XXX-OS}.
\item You will find the executable in \texttt{iqtree-XXX-OS/}. 
       This executable you should rename to \texttt{iqtree} (or \texttt{iqtree.exe}
       on Windows systems) and copy it to your system search path
       such that it is found by your system.
\end{enumerate}

\textbf{Note on multi--ompcore version:} The executable is named \texttt{iqtree-omp} 
(or \texttt{iqtree-omp.exe} on Windows). Please also copy other files needed for OpenMP (e.g., \texttt{*.dll} on Windows)
to the same folder that you copied \texttt{iqtree-omp} to. Finally, for Mac OS X
you have to install MacPorts and the associated gcc47 to run \texttt{iqtree-omp} properly (see a how-to in section \ref{sec:build-openmp}).

    If you encounter problems, please ask your local administrator for help.

\subsection{Building source package}

    To build IQ-TREE from the sources you need a C++ compiler (e.g., gcc) and the CMake tool
    installed (This is usually the case on UNIX/Linux systems. For 
    Windows you might want to obtain CygWin/MinWG/MS Visual C++ or XCode for MacOSX). 
    Then you can follow the procedure below:

\begin{enumerate}
\item Download the current version of the software (\texttt{iqtree-XXX-Source.tar.gz} or\\
       \texttt{iqtree-XXX-Source.zip}, where \texttt{XXX} is the current version number) from\\ 
       \url{http://www.cibiv.at/software/iqtree}
\item  Extract the files (e.g., with \texttt{tar xvzf iqtree-XXX-Source.tar.gz} under Unix).
       This should create a directory \texttt{iqtree-XXX-Source}.
\item  Change into this directory.
\item  Create a sub-directory \texttt{build} and go into this sub-directory by entering:
\begin{verbatim}
         mkdir build
         cd build
\end{verbatim}

\item  Configure the source codes using CMake:

\begin{verbatim}
         cmake ..
\end{verbatim}

\item Compile and build the source codes:
\begin{verbatim}
         make
\end{verbatim}

       This creates an executable \texttt{iqtree}
       (or \texttt{iqtree.exe} on Windows systems).  This executable can copied to your system search path
       such that it is found by your system.
\end{enumerate}

    If you encounter problems, please ask your local administrator for help.

\subsection{Building multi-core parallel version (\textcolor{red}{Update!})}
\label{sec:build-openmp}

To build the multi-core version you need a compiler that supports the OpenMP standard (e.g., gcc).
For Linux and Windows the gcc and MinGW compilers work just fine.
However, in our test on Mac OS X, IQ-TREE was successfully compiled with the default the XCode gcc
but the example run crashed for unknown reason.
Therefore, we employed  MacPorts (with gcc47 or later) and successfully ran IQ-TREE compiled with MacPorts gcc. To this end, please first install
MacPorts, gcc in MacPorts (\texttt{sudo port install gcc47}) and configure gcc to point to the MacPorts' gcc 
version (\texttt{sudo port select --set gcc mp-gcc47}). 

The compilation then follows the same route with slightly changed command line for cmake:

\begin{verbatim}
         cmake .. -DIQTREE_FLAGS="omp"
\end{verbatim}

All other commands remain the same. It is recommended to copy the executable file \textcolor{red}{\texttt{iqtree-omp}}
(or \texttt{iqtree-omp.exe} on Windows)
 to the system search path such that one can simply run \texttt{iqtree-omp} from the command-line.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Tutorial}
\label{tutorial}

This section gives users a quick starting guide. You can either download the binary
for your platform from the IQ-TREE website or the source code. In the later case,
you will need to compile the source code (see \emph{Installation} section \ref{Installation}). For the next steps, the \texttt{iqtree}
executable should be then copied into the \texttt{bin} folder such that
IQ-TREE can be invoked by simply entering \texttt{'iqtree'} at the command-line.
You can run \texttt{'iqtree -h'} to see a list of options available in IQ-TREE.

%============================================%
\subsection{First running example(\textcolor{red}{Update!})}

From the download there is an example alignment called \texttt{example.phy}
 in PHYLIP format (IQ-TREE also supports FASTA and NEXUS files). You can now start to reconstruct a maximum-likelihood tree
from this alignment by typing (assuming that you are now in the same folder with \texttt{example.phy}):
\begin{verbatim}
  iqtree -s example.phy
\end{verbatim}
\texttt{'-s'} is the option to specify the name of the alignment file that is always required by
IQ-TREE to work. At the end of the run IQ-TREE will write several output files:

\begin{itemize}
\item \texttt{example.phy.iqtree}: the main report file that is self readable for users. You
should look at this file to see the results.
\item \texttt{example.phy.treefile}: the ML tree in NEWICK format, which can be visualized
by tree viewer tools such as FigTree, iTOL. Note that this newick tree is also embedded in 
\texttt{example.phy.iqtree}.
%\item \texttt{example.phy.bionj}: the BIONJ tree in NEWICK format, which is used internally
%by IQ-TREE as a starting tree for the tree search procedure.
%\item \texttt{example.phy.jcdist}: the Juke-Cantor corrected distance matrix.
%\item \texttt{example.phy.mldist}: the ML distance matrix (based on the given substitution model).
\item \texttt{example.phy.log}: log file of the entire run (also printed on the screen). To report
bugs, please send this log file and the original alignment file to the authors.
\end{itemize}

Note that all output files have the default prefix as the alignment file name. You can always 
change the prefix using the \texttt{'-pre'} option, e.g.:
\begin{verbatim}
  iqtree -s example.phy -pre myprefix
\end{verbatim}
Then IQ-TREE will write output files \texttt{myprefix.iqtree, myprefix.treefile}, etc. This is
 helpful when you do several runs for the same input.

\textcolor{red}{******** NEW IN VERSION 1.0 ********}

Since version 1.0 IQ-TREE by default offers a more accurate tree search and bootstrap by 
optimizing five branches around the nearest neighbor interchanges (NNIs). This comes with a trade-off
of approximately 2X longer running time than 0.9.X version. To switch back to
old behaviour of optimizing one branch around NNIs, simply use the \texttt{-nni1} option:

\begin{verbatim}
  iqtree -s example.phy -nni1
\end{verbatim}


%============================================%
\subsection{Choosing the substitution model}

IQ-TREE supports numerous substitution models for binary, DNA, and protein data and Gamma rate 
heterogeneity model. If you do not specify, IQ-TREE will use the default HKY, WAG, and JC models for DNA, protein,
and binary alignments, respectively. For most data sets these models are too simplified.
If you have no idea about which model is appropriate for your data, let IQ-TREE automatically determine the best-fit model 
for your alignment with:
\begin{verbatim}
  iqtree -s example.phy -m TEST
\end{verbatim}
\texttt{'-m'} is the option to specify the model name to use during the analysis. \texttt{'TEST'}
is a key word telling IQ-TREE to first select the best-fit model. The remaining analysis
will be done using the selected model. More specifically, IQ-TREE computes the log-likelihoods
of the initial BIONJ tree for many different models and the Akaike information criterion (AIC), 
corrected Akaike information criterion (AICc), and the Bayesian information criterion (BIC).
Then IQ-TREE chooses the model that minimizes the BIC score (you can also change to AIC or AICc by 
adding the option "-AIC" or "-AICc", respectively).
Moreover, IQ-TREE will write an additional file:

\begin{itemize}
 \item \texttt{example.phy.model}: log-likelihoods for all models tested.
\end{itemize}

If you now look at \texttt{example.phy.iqtree} you will see that IQ-TREE selected the model \texttt{'TIM'}
with \texttt{'Invar+Gamma'} rate heterogeneity. So \texttt{'TIM+I+G'} is the best-fit model
for this example data. From now on you can run with e.g.:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G
\end{verbatim}
Sometimes you only want to find the best-fit model without doing tree reconstruction, then run:
\begin{verbatim}
  iqtree -s example.phy -m TESTONLY
\end{verbatim}
Here, IQ-TREE will stop after finishing the model selection. The name of the best-fit model will be printed on the screen.
Finally, note that IQ-TREE will check if the file \texttt{'*.model'} exists and is correct.
If so, it will automatically reuse the log-likelihoods computed to speed up the model selection procedure.

\textcolor{red}{********NEW********}

Since version 0.9.6 IQ-TREE offers the partition model selection for multi-gene analysis. 
See Section \ref{sec.partition-model-selection} for more details.


%============================================%
\subsection{Support for phylogenetic likelihood library (\textcolor{red}{New in version 1.0!})}

In the major release 1.0, we added the support for PLL \citep{tomas2014} which helps to speed 
up IQ-TREE by a factor of 2X to 8X. To test the new feature simply use option \texttt{'-pll'}, for example:

\begin{verbatim}
  iqtree -s example.phy -pll -m GTR+G
\end{verbatim}

Here, we also specifies model GTR+G as it is currently supported by PLL. Note that this
option does not entirely work with other options yet (such as \texttt{'-m TEST'}). In such cases, an error message will be displayed.

%============================================%
\subsection{Novel tree search algorithm (\textcolor{red}{New in version 1.0!})}
\label{sec.new-tree-search}

IQ-TREE 1.0 implemented a new tree search algorithm, which explore the tree space much more
efficiently than version 0.9.X. Here, IQ-TREE combines parsimony analysis to provide better
starting trees, new stochastic algorithm to escape local optima, and new stopping rule. The new search strategies
come with a few parameters where the default values were tested to work well for many different data sets.
Moreover, you can change the default parameters with options:

\begin{verbatim}
  -numpars <number>    Number of initial parsimony trees (default: 100)
  -toppars <number>    Number of top initial parsimony trees (dfault: 20)
  -numcand <number>    Number of candidate trees during search (defaut: 5)
  -pers <perturbation> Perturbation strength for stochastic NNI (default: 0.5)
  -numstop <number>    Number of unsuccessful iterations to stop (default: 100)
\end{verbatim}

Finally, you can still switch back to the old algorithm of 0.9.X by options:

\begin{verbatim}
  -iqp                 Use IQP tree perturbation (default: sNNI)
  -iqpnni              Switch entirely to old IQPNNI algorithm
\end{verbatim}

%============================================%
\subsection{Codon models (\textcolor{red}{New in version 1.0!})}

IQ-TREE 1.0 supports basic codon models (GY, MG, and ECM). You need to input a protein-coding DNA alignment and specify codon data by option \texttt{'-st CODON'} (Otherwise, IQ-TREE applies DNA model because it detects that your alignment has DNA sequences):

\begin{verbatim}
  iqtree -s coding_gene.phy -st CODON 
\end{verbatim}

If your alignment length is not divisible by 3, an error message will occur. IQ-TREE will group sites 1,2,3 into codon site 1; site 4,5,6 to codon site 2; etc. Moreover, any codon, which has at least one gap/unknown/ambiguous nucleotide, will be treated unknown codon character.

If you are not sure which model to use, simply add \texttt{'-m TEST'}, which also works for codon alignments: 

\begin{verbatim}
  iqtree -s coding_gene.phy -st CODON -m TEST
\end{verbatim}

By default IQ-TREE uses the standard genetic code.
You can change to other genetic code (see \url{http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi}) with following options:

\begin{tabular}{ll}
\hline
Option & Genetic code\\
\hline
\texttt{-st CODON1} & The Standard Code (same as \texttt{-st CODON})\\
\texttt{-st CODON2} & The Vertebrate Mitochondrial Code\\
\texttt{-st CODON3} & The Yeast Mitochondrial Code\\
\texttt{-st CODON4} & The Mold, Protozoan, and Coelenterate Mitochondrial Code and \\
   & the Mycoplasma/Spiroplasma Code\\
\texttt{-st CODON5} & The Invertebrate Mitochondrial Code\\
\texttt{-st CODON6} & The Ciliate, Dasycladacean and Hexamita Nuclear Code\\
\texttt{-st CODON9} & The Echinoderm and Flatworm Mitochondrial Code\\
\texttt{-st CODON10} & The Euplotid Nuclear Code\\
\texttt{-st CODON11} & The Bacterial, Archaeal and Plant Plastid Code\\
\texttt{-st CODON12} & The Alternative Yeast Nuclear Code\\
\texttt{-st CODON13} & The Ascidian Mitochondrial Code\\
\texttt{-st CODON14} & The Alternative Flatworm Mitochondrial Code\\
\texttt{-st CODON16} & Chlorophycean Mitochondrial Code\\
\texttt{-st CODON21} & Trematode Mitochondrial Code\\
\texttt{-st CODON22} & Scenedesmus obliquus Mitochondrial Code\\
\texttt{-st CODON23} & Thraustochytrium Mitochondrial Code\\
\texttt{-st CODON24} & Pterobranchia Mitochondrial Code\\
\texttt{-st CODON25} & Candidate Division SR1 and Gracilibacteria Code\\
\hline
\end{tabular}

%============================================%
\subsection{Morphological and SNP data (\textcolor{red}{New in version 1.0!})}

IQ-TREE 1.0 supports discrete morphological alignment by \texttt{'-st MORPH'} option:

\begin{verbatim}
  iqtree -s morphology.phy -st MORPH
\end{verbatim}

IQ-TREE implements to two morphological ML models (MK and ORDERED; see Lewis 2001), where MK is the default model.
MK is a Juke-Cantor-like model. ORDERED model considers only transitions between states $i\rightarrow i-1$, $i\rightarrow i$, and $i \rightarrow i+1$. Morphological data typically do not have constant (uninformative) sites. 
In such case, you should apply ascertainment bias correction model by e.g.:
 
\begin{verbatim}
  iqtree -s morphology.phy -st MORPH -m MK+ASC
\end{verbatim}

You can again select best-fit model with \texttt{'-m TEST'} (which also consider +G):

\begin{verbatim}
  iqtree -s morphology.phy -st MORPH -m TEST
\end{verbatim}

For SNP data (DNA) that typically do not contain constant sites, you can explicitly tell model to include
ascertainment bias correction:

\begin{verbatim}
  iqtree -s SNP_data.phy -m GTR+ASC
\end{verbatim}

You can explicitly tell model testing to only include \texttt{'+ASC'} model with:
\begin{verbatim}
  iqtree -s SNP_data.phy -m TEST+ASC
\end{verbatim}

%============================================%
\subsection{Assessing branch supports with ultrafast bootstrap approximation}

The ultrafast bootstrap approximation is the most value-added feature available in IQ-TREE. Simply run:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G -bb 1000
\end{verbatim}
\texttt{'-bb'}  specifies the number of bootstrap replicates where $1000$
is the minimal number recommended. When you now look at the section \texttt{'MAXIMUM LIKELIHOOD TREE'}
in \texttt{example.phy.iqtree}, you will see that every internal node of the tree figure
will be associated a support value in percentage. Branch supports are assigned onto the ML tree and printed in 
\texttt{example.phy.treefile} that can be viewed again in FigTree. 
In addition, IQ-TREE writes the following files:
\begin{itemize}
\item \texttt{example.phy.contree}: the consensus tree with assigned branch supports where branch lengths 
are optimized  on the original alignment.
 \item \texttt{example.phy.splits}: support values in percentage for all splits (bipartitions),
computed as the occurence frequencies in the bootstrap trees. This file is in "star-dot" format.
\item \texttt{example.phy.splits.nex}: has the same information as \texttt{example.phy.splits}
but in NEXUS format, which can be viewed with SplitsTree program. 
\end{itemize}

%============================================%
\subsubsection{Assessing branch supports with  standard nonparametric bootstrap}

The standard nonparametric bootstrap can be invoked by:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G -b 100
\end{verbatim}
\texttt{'-b'} specifies the number of bootstrap replicates where $100$
is the minimal number recommended. IQ-TREE will additionally writes the following files:

\begin{itemize}
 \item \texttt{example.phy.boottrees}: the set of bootstrap trees reconstructed.
\item \texttt{example.phy.contree}: the bootstrap consensus tree with assigned branch supports where branch lengths 
are optimized  on the original alignment.
\end{itemize}

%============================================%
\subsubsection{Assessing branch supports with single branch tests}

IQ-TREE provides an implementation of the SH-like approximate likelihood ratio test \citep[SH-aLRT; ][]{guindon2010}.
To perform this test, simply run:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G -alrt 1000
\end{verbatim}
\texttt{'-alrt'} specifies the number of bootstrap replicates for SH-aLRT where $1000$ is
the minimal number recommended. IQ-TREE will perform SH-aLRT at the end of the tree reconstruction process
and assign support values onto the ML tree. The support values will be reflected in the tree file \texttt{example.phy.treefile}.

IQ-TREE also provides a fast implementation of the local bootstrap probabilities method \citep{adachi1996b}, 
which we call Fast-LBP. Fast-LBP computes the branch support by comparing the tree log-likelihood
with the log-likelihoods of the two alternative nearest-neighbor-interchange (NNI) trees around the branch of interest.
However, Fast-LBP is different from LBP where we compute the log-likelihoods of the two alternative NNI trees
by only reoptimizing five branches around the branch of interest (Similar idea is used in the SH-aLRT test).
To perform Fast-LBP, simply run:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G -lbp 1000
\end{verbatim}

You can also perform both tests:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G -alrt 1000 -lbp 1000
\end{verbatim}
The branches of the resulting ML tree will be assigned with both SH-aLRT and Fast-LBP support values.
Finally, you can also combine the ultrafast bootstrap approximation with single branch tests within one single run:
\begin{verbatim}
  iqtree -s example.phy -m TIM+I+G -bb 1000 -alrt 1000 -lbp 1000
\end{verbatim}

%============================================%
\subsection{Partitioned analysis for multi-gene alignments}
\label{sec.partition-model}

In the partition model, you can specify a substitution model for each gene/character set individually. 
IQ-TREE will then estimate the model parameters and branch lengths separately for every partition.
To this end, you have to first prepare a NEXUS file including a \texttt{SETS} block with
\texttt{CharSet} and \texttt{CharPartition} commands to specify individual genes and the partition, respectively.
For example:
\begin{verbatim}
#nexus
begin sets;
        charset part1 = 1-100;
        charset part2 = 101-384;
        charpartition mine = HKY+G:part1, GTR+I+G:part2;
end;
\end{verbatim}

Now if you save this into a file \texttt{example.nex} and run:
\begin{verbatim}
  iqtree -s example.phy -sp example.nex
\end{verbatim}
This means that IQ-TREE will partition the alignment \texttt{example.phy} into 2 subsets named \texttt{part1} and \texttt{part2}
containing sites (columns) 1-100 and 101-384, respectively. Moreover, IQ-TREE applies the
subtitution models \texttt{HKY+G} and \texttt{GTR+I+G} to \texttt{part1} and \texttt{part2}, respectively.
After the run has finished, the \texttt{example.nex.iqtree} file will contain substitution model 
parameters, trees with branch lengths for all subsets in the partition.


Moreover, the \texttt{CharSet} command allows to specify non-consecutive sites using comma-separated list of ranges with e.g.:
\begin{verbatim}
        charset part1 = 1-100 200-384;
\end{verbatim}
That means, \texttt{part1} contains sites 1-100 and 200-384 of the alignment. Another example is:
\begin{verbatim}
        charset part1 = 1-100\3;
\end{verbatim}
for extracting sites 1,4,7,...,100 from the alignment. This is useful for getting codon positions from the protein-coding alignment.

Moreover, IQ-TREE allows a more advanced feature compared to other programs: 
IQ-TREE allows different subsets coming from different alignments.
For example:
\begin{verbatim}
#nexus
begin sets;
        charset part1 = part1.phy: 1-100\3 201-300\3;
        charset part2 = part2.phy: 101-300;
        charpartition mine = HKY:part1, GTR+G:part2;
end;
\end{verbatim}
Here, \texttt{part1} and \texttt{part2} are read from alignment files \texttt{part1.phy} and \texttt{part2.phy}, respectively
(a ':' is needed to separate the alignment file name and site specification). Because the alignment file names
were embedded in this NEXUS file, you can simply run:
\begin{verbatim}
  iqtree -sp example.nex
\end{verbatim}

Note that 
\texttt{part1.phy} and \texttt{part2.phy} need not contain the same set of sequence names. That means, if some sequence occurs
in  \texttt{part1.phy} but not in  \texttt{part2.phy}, IQ-TREE will treat corresponding part of sequence
in \texttt{part2.phy} as missing data. For your convenience IQ-TREE writes the concatenated alignment
into the file \texttt{example.nex.conaln}.

\textcolor{red}{********NEW EXPERIMENTAL FEATURE********}

Since version 0.9.6 IQ-TREE supports partition models with joint and proportional branch lengths between genes. This is
to reduce the number of parameters in case of model overfitting for the full partition model. For example:

\begin{verbatim}
  iqtree -spp example.nex
\end{verbatim}

applies a proportional partition model. That means, we have only one set of branch lengths for species tree 
but allow each gene to evolve under a specific rate (scaling factor) normalized to the average of 1.

A partition model with joint branch lengths is specified by:

\begin{verbatim}
  iqtree -spj example.nex
\end{verbatim}
 
(i.e., all gene-specific rates are equal to 1). 
 
 
%============================================%
\subsubsection{Choosing the right partitioning scheme}
\label{sec.partition-model-selection}

Since version 0.9.6 IQ-TREE implements a greedy strategy \citep{lanfear2012} that starts with the full partition model and sequentially
merges two genes until the model fit does not increase any further:

\begin{verbatim}
  iqtree -sp example.nex -m TESTLINK
\end{verbatim}

After the best partition is found IQ-TREE will immediately start the tree reconstruction under the best-fit partition model.
Sometimes you only want to find the best-fit partition model without doing tree reconstruction, then run:


\begin{verbatim}
  iqtree -sp example.nex -m TESTONLYLINK
\end{verbatim}


%============================================%
\subsubsection{Bootstrapping with partition model}

IQ-TREE can perform the ultrafast bootstrap with partition models by e.g.,
\begin{verbatim}
  iqtree -sp example.nex -bb 1000
\end{verbatim}
Here, IQ-TREE will resample the sites \emph{within} subsets of the partitions (i.e., 
the bootstrap replicates are generated per subset separately and then concatenated together).
The same holds true if you do the standard nonparametric bootstrap. 

\textcolor{red}{********NEW********}

Since version 0.9.6 IQ-TREE supports the gene-resampling strategy: 

\begin{verbatim}
  iqtree -sp example.nex -bb 1000 -bspec GENE
\end{verbatim}

is to resample genes instead of sites. Moreover, IQ-TREE allows an even more complicated
strategy: resampling genes and sites within resampled genes:

\begin{verbatim}
  iqtree -sp example.nex -bb 1000 -bspec GENESITE
\end{verbatim}


%============================================%
\subsection{Utilizing multi-core CPUs}

A specialized version of IQ-TREE allows users to perform the analysis 
that utilizes multiple cores during the run (made possible by the OpenMP library).
You can download the binary from the software website or compile the source code
yourself (see \emph{Installation} section \ref{Installation}). For the following please
copy the binary \texttt{iqtree-omp} and other files in the package bin folder into the system \texttt{bin} folder such that it can be
invoked from the command-line by simply running the command \texttt{iqtree-omp}.

If you now run with e.g.:
\begin{verbatim}
  iqtree-omp -s example.phy
\end{verbatim}
Then IQ-TREE will use all the available cores of your CPU. 
This might not be a good practice because our parallelization technique only works well on long alignments.
If you have a very short alignment, it is not recommended to use this IQ-TREE version.
Because the speedup gain depends on the alignment length,
a good practice is to run this version with increasing number of cores by e.g.:
\begin{verbatim}
  iqtree-omp -s example.phy -omp 2
\end{verbatim}
Here, \texttt{-omp} is the option to specify the number of cores that IQ-TREE will use.
If you see that the wall-clock time reduction is substantial compared with the sequential IQ-TREE version,
then you can try:
\begin{verbatim}
  iqtree-omp -s example.phy -omp 3
\end{verbatim}
and so on, until no substantial reduction of running time is observed. The remaining
analysis can then be carried out with that number of cores.

For example, on my computer (Linux, Intel Core i5-2500K, 3.3 GHz, quad cores) I observed the following 
wall-clock running time for this  example alignment:
\begin{center}
\begin{tabular}{cc}
\hline
No. cores & Wall-clock time\\
\hline
1 & 21.465 sec.\\
2 & 13.627 sec.\\
3 & 11.119 sec.\\
4 & 10.807 sec.\\
\hline
\end{tabular}
\end{center}
Therefore, I would only use 2 cores for this specific alignment (\texttt{"-omp 2"} option).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Advanced tutorial}
\label{sec.advanced-tutorial}

This section gives an advanced tutorial for more experienced users. It includes several advanced features
like tree topology test, user-defined substitution models.


%============================================%
\subsection{Tree topology tests}

IQ-TREE can compute log-likelihoods of user-defined trees passed via \texttt{-z} option:

\begin{verbatim}
  iqtree -s example.phy -z example.treels
\end{verbatim}

assuming that \texttt{example.treels} contains the trees in NEWICK format. 
At the end of the usual run, IQ-TREE will additionally evaluate all trees in there using the estimated model parameters.
When you look into \texttt{example.phy.iqtree}
there will be a section \texttt{USER TREES} that lists the tree IDs and the corresponding log-likelihoods.
Moreover, IQ-TREE will additionally write a file:
\begin{itemize}
\item \texttt{example.phy.treels.trees}: the trees with optimized branch lengths.
\end{itemize}

If you only want to evaluate the trees without reconstructing the ML tree, you can run:
\begin{verbatim}
  iqtree -s example.phy -z example.treels -n 1
\end{verbatim}

Here, IQ-TREE will only reconstruct the BIONJ+NNI tree and use that tree to estimate the model parameters,
which are normally accurate enough for our purpose.

IQ-TREE also supports several tree topology tests using the RELL approximation \citep{kishino1990} 
including: bootstrap proportion (BP), Kishino-Hasegawa test \citep[KH; ][]{kishino1989}, Shimodaira-Hasegawa test \citep[SH; ][]{shimodaira1999}, expected likelihood weights \citep[ELW; ][]{strimmer2002}, weighted-KH (WKH), and weighted-SH (WSH) tests.
The trees are passed via \texttt{-z} option, thus you can run:

\begin{verbatim}
  iqtree -s example.phy -z example.treels -n 1 -zb 1000
\end{verbatim}

Here, \texttt{-zb} specifies the number of RELL replicates, where 1000 is the minimum number recommended.
The \texttt{USER TREES} section of \texttt{example.phy.iqtree} will list the results of BP, KH, SH, and ELW methods. If you want to
also perform the WKH and WSH, simply add \texttt{-zw} option:

\begin{verbatim}
  iqtree -s example.phy -z example.treels -n 1 -zb 1000 -zw
\end{verbatim}

Finally, note that IQ-TREE will automatically detect duplicated tree topologies and omit them during the evaluation.


%============================================%
\subsection{User-defined substitution models}

Users can specify an arbitrary DNA models using a 6-letter specification that constrains which rates to be equal. 
For example, \texttt{010010} corresponds to the HKY model and \texttt{012345} the GTR model.
In fact, the IQ-TREE source code internally uses this specification to simplify the coding. The 6-letter code is specified
via -m option, e.g.:

\begin{verbatim}
  iqtree -s example.phy -m 010010+G
\end{verbatim}

Moreover, with -m option one can input a file name which contains the 6 rates (A-C, A-G, A-T, C-G, C-T, G-T) 
and 4 base frequencies (A, C, G, T), e.g.:

\begin{verbatim}
  iqtree -s example.phy -m mymodel+G
\end{verbatim}

where \texttt{mymodel} is a file containing the 10 entries described above. One can even specify the rates within -m option by e.g.:

\begin{verbatim}
  iqtree -s example.phy -m 'TN{2.0,3.0}+G8{0.5}+I{0.15}'
\end{verbatim}

That means, we use Tamura-Nei model with fixed transition-transversion rate ratio of 2.0 and purine/pyrimidine rate ratio of 3.0. Moreover, we
use an 8-category Gamma-distributed site rates with the shape parameter (alpha) of 0.5 and a proportion of invariable sites p-inv=0.15.

Note that by default IQ-TREE computes empirical state frequencies from the alignment, but one can also optimize the frequencies by maximum-likelihood
with \texttt{+Fo} in the model name:

\begin{verbatim}
  iqtree -s example.phy -m GTR+G+Fo
\end{verbatim}

For amino-acid alignments, if one wants to use the frequencies of the empirical protein model, then use \texttt{+Fu}, for example:

\begin{verbatim}
  iqtree -s myprotein_alignment -m WAG+G+Fu
\end{verbatim}

Finally, note that all model specifications above can be used in the partition model NEXUS file.

%============================================%
\subsection{Consensus construction and bootstrap value assignment}

IQ-TREE can construct an extended majority-rule consensus tree from a set of trees written in NEWICK or NEXUS format (e.g., produced
by MrBayes):

\begin{verbatim}
  iqtree -con mytrees
\end{verbatim}

To build a majority-rule consensus tree, simply set the minimum support threshold to 0.5:

\begin{verbatim}
  iqtree -con mytrees -t 0.5
\end{verbatim}

If you want to specify a burn-in (the number of beginning trees to ignore from the trees file), use -bi option:

\begin{verbatim}
  iqtree -con mytrees -t 0.5 -bi 100
\end{verbatim}

to skip the first 100 trees in the file.

IQ-TREE can also compute a consensus network and print it into a NEXUS file by:

\begin{verbatim}
  iqtree -net mytrees
\end{verbatim}

Finally, an useful feature is to read in an input tree and a set of trees, then IQ-TREE can assign the
support value onto the input tree (number of times each branch in the input tree occurs in the set of trees) by:

\begin{verbatim}
  iqtree -sup input_tree set_of_trees
\end{verbatim}


%============================================%
\subsection{Computing Robinson-Foulds distance between trees}

IQ-TREE implements a very fast Robinson-Foulds (RF) distance computation using hash table, which is a lot faster  than PHYLIP package. For example, you can run:

\begin{verbatim}
  iqtree -rf tree_set1 tree_set2
\end{verbatim}

to compute the pairwise RF distances between 2 sets of trees. If you want to compute the all-to-all RF distances
of a set of trees, use:

\begin{verbatim}
  iqtree -rf_all tree_set
\end{verbatim}

%============================================%
\subsection{Generating random trees}

IQ-TREE provides several random tree generation models. For example,

\begin{verbatim}
    iqtree -r 100 100.tree 
\end{verbatim}

is to generate a 100-taxon random tree into the file \texttt{100.tree} under the Yule Harding model,
where the branch lengths follow an exponential distribution with mean of 0.1.
If you want to change the branch length distribution, run e.g:

\begin{verbatim}
    iqtree -r 100 -rlen 0.05 0.2 0.3 100.tree 
\end{verbatim}

to set the minimum, mean, and maximum branch lengths as 0.05, 0.2, and 0.3, respectively.
If you want to generate trees under uniform model instead, use '-ru' option:

\begin{verbatim}
    iqtree -ru 100 100.tree 
\end{verbatim}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Frequently asked questions (FAQ)}

\subsection{How does IQ-TREE treat gap/missing characters?}

Gaps (-) and missing characters (? or N for DNA alignments) are treated in the same way as \emph{unknown} characters, 
which represent no information. The same treatment holds for many other ML software (RAxML, PhyML, etc.). Technically
in the Felsenstein's pruning algorithm we fill a partial likelihood vector of all 1's for all character states. This is the same as follows.
For a site (column) of an alignment containing AC-AG-A (i.e. A for sequence 1, C for sequence 2, - for sequence 3,...), the site-likelihood
of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters:

\[ \ell(T | AC-AG-A) = \ell(T_{sub} | ACAGA) \]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Version History}
\label{Version History}

\begin{description}
\item \textbf{Version 0.9.6:} October 2013
\begin{itemize}
\item Ultrafast model selection and partitioning for phylogenomic alignments.
\item Introduction of nearest neighbor interchange (NNI) with five branch optimization to evaluate candidate NNIs. 
This will bring higher accuracy for tree reconstruction and bootstrap with a tradeoff of c.a. 2X longer running time.
\item Introduction of joint and proportional partition models to reduce the number of parameters in case of model overfitting (experimental).
\item Introduction of gene-resampling and gene-and-site resampling for the bootstrap on multi-gene alignments.
\end{itemize}


\item \textbf{Version 0.9.5:} May 2013
\begin{itemize}
\item Introduction of bootstrap epsilon to select equally good bootstrap trees at random to deal with polytomies
\end{itemize}

\item \textbf{Version 0.9.4:} Easter 2013
\begin{itemize}
\item Tree topology tests
\end{itemize}
\item \textbf{Version 0.9.3:} March 2013
\begin{itemize}
\item New implementation of model selection that works on all data types.
\item A tutorial about using partition models.
\item Parallel OpenMP support to utilize multi-core CPUs.
\end{itemize}
\item \textbf{Version 0.9.0:} September 2012 - 
First beta release.
\end{description} 


\section*{Credits and Acknowledgement}

Some parts of the code were taken from the following packages/libraries: Phylogenetic likelihood library \citep{tomas2014}, TREE-PUZZLE  \citep{schmidt2002}, 
BIONJ \citep{gascuel1997}, Nexus Class Libary \citep{lewis2003}, Eigen library \citep{guennebaud2010},
SPRNG library \citep{mascagni2000}, Zlib library (\url{http://www.zlib.net}).

Financial supports from the Austrian Science Fund (FWF), the Vienna Science and Technology Fund (WWTF), and the University of Vienna are greatly appreciated.

\bibliographystyle{bioinformatics}
\bibliography{genephylo,heiko} %%%%%%%%%%%


\end{document}