1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993
|
\documentclass[a4paper,11pt]{article}
%\documentclass{article}
\usepackage{graphicx}
\usepackage{color}
\usepackage[round]{natbib}
%\usepackage{url}
\usepackage{hyperref}
%\newcommand{\xxx}{\rule{10mm}{1ex}}
%\hyphenation{IN-FILE-NAME PUZZLE}
%\sloppy
\hoffset -1in %% Initialization of documents is with a horizontal and
\voffset -1in %% a vertical offset of one inch. %%
%\raggedright %% Prevents horizontal block format. %%
\setlength{\parindent}{0cm} %% Each paragraph is indented by 1 cm. %%
\setlength{\parskip}{0.3cm} %% Each paragraph is indented by 1 cm. %%
\setlength{\oddsidemargin}{1.1in} %% Defines the left side margin of a document. %%
\setlength{\evensidemargin}{1.1in} %% Defines the right side margin of a document. %%
\setlength{\topmargin}{1mm} %% Space between top of page and header. %%
\setlength{\headheight}{30mm} %% Height of header. %%
\setlength{\textwidth}{154mm} %% Width of text. %%
\setlength{\textheight}{215mm} %% Height of text. %%
\newcommand{\iqtree}{$\mathcal{IQ-TREE}$}
\begin{document}
\begin{titlepage}
\noindent
\hfill
\begin{center}
\begin{LARGE}\textbf{IQ-TREE version 1.0 (July 2014)\\[2ex]Fast phylogenetic inference and ultrafast bootstrap analysis by maximum likelihood.}
\end{LARGE}
\end{center}
%\hfill~
\vfill
\begin{center}
\begin{LARGE}User Manual and Tutorial
\end{LARGE}
\vfill
\begin{tabular}{ll}
%\small Copyright (C) 2012-2013 by & \small Bui Quang Minh, Lam-Tung Nguyen, Heiko A. Schmidt, \\
% & and \small Arndt von Haeseler \\
\end{tabular}
\end{center}
\begin{LARGE}Please read carefully before using IQ-TREE the first time!
\end{LARGE}
\vfill
\begin{description}
\item[Project managers:] ~\\
Bui Quang Minh - \texttt{minh.bui(at)mfpl.ac.at}
Arndt von Haeseler - \texttt{arndt.von.haeseler(at)mfpl.ac.at}
\item [Core developers:] ~\\
Lam-Tung Nguyen - \texttt{tung.nguyen(at)mfpl.ac.at}
Olga Chernomor - \texttt{olga.chernomor(at)mfpl.ac.at}
Diep Thi Hoang - \texttt{diep.thi.hoang(at)gmail.com}
\item [Support:] ~\\
Heiko A. Schmidt - \texttt{heiko.schmidt(at)mfpl.ac.at}
\item [Contact address:] ~\\
Center for Integrative Bioinformatics Vienna (CIBIV)\\
Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna\\
Dr. Bohr-Gasse 9, A-1030 Vienna, Austria\\
\end{description}
\vfill
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\textbf{License Agreement}
\label{Legal Stuff}
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
\vfill
\end{titlepage}
\tableofcontents
\clearpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
\label{introduction}
IQ-TREE is an efficient program for reconstructing large maximum likelihood trees and
assessing branch supports with the ultrafast bootstrap approximation.
IQ-TREE extends the IQPNNI algorithm with many enhancements.
IQ-TREE is open-source and available free of charge from
\url{http://www.cibiv.at/software/iqtree/}
IQ-TREE has been tested on Unix, Mac OS X, and Windows.
The code of IQ-TREE has been written in standard C/C++, which is possibly
compilable on other platforms.
Please read the \emph{Installation} section \ref{Installation} for more
details.
We suggest that this documentation should be read before using IQ-TREE
the first time!
For impatient users we established a very user-friendly web server:
\url{http://iqtree.cibiv.univie.ac.at}
Its intuitive web interface allows users to perform online tree reconstruction within a few clicks.
Note that this online service only allows max. 12 CPU hours and 1 GB memory per job.
In case your job exceeds these limits, you can copy and paste the command-line displayed to
run the analysis at your local machine.
To cite IQ-TREE please use the following paper:
\textbf{Bui Quang Minh, Minh Anh Thi Nguyen, and Arndt von Haeseler} (2013) Ultrafast approximation for phylogenetic bootstrap. \emph{Mol. Biol. Evol.}, 30:1188-1195.
%A manuscript was submitted:
%\textbf{Lam-Tung Nguyen, Heiko A. Schmidt, Arndt von Haeseler, and Bui Quang Minh} (2014) IQ-TREE: A fast and
%effective stochastic algorithm for estimating maximum likelihood phylogenies.
Further readings on the methods developed:
\begin{itemize}
\item \textbf{Heiko A. Schmidt and Arndt von Haeseler} (2009) Phylogenetic Inference Using Maximum Likelihood Methods. In P. Lemey, M. Salemi, A.M. Vandamme (eds.)\emph{The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing.}, 2nd Edition, 181-209, Cambridge University Press, Cambridge.
\item \textbf{Bui Quang Minh, Le Sy Vinh, Arndt von Haeseler and Heiko A. Schmidt} (2005) pIQPNNI: Parallel reconstruction of large maximum likelihood phylogenies. \emph{Bioinformatics}, 21(19):3794-6.
\item \textbf{Le Sy Vinh and Arndt von Haeseler} (2004) IQPNNI: Moving fast through tree space and stopping in time. \emph{Mol. Biol. Evol.}, 21(8):1565-1571.
\end{itemize}
%\item Tung-Lam Nguyen, Heiko A. Schmidt, Bui Quang Minh, and Arndt von Haeseler (2012) IQ-TREE: Efficient algorithm
%for phylogenetic inference by maximum likelihood and important quartet puzzling. \emph{In prep.}
If you encounter bugs please send the \texttt{.log} file of the run and possibly the alignment to: \texttt{tung.nguyen(AT)univie.ac.at} and \texttt{minh.bui(AT)univie.ac.at}.
%============================================%
\subsection{What's new in version 1.0?}
\label{whatnews}
Version 1.0 is the major release of the IQ-TREE software. We are happy to announce the following new features:
\begin{itemize}
\item Integration of the phylogenetic likelihood library \citep[PLL; ][]{tomas2014} for fast likelihood computation. This is enabled via \texttt{-pll} option and gives a speedup of 2X to 8X.
\item A novel fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. It outperforms RAxML and PhyML in terms of log-likelihoods while requiring similar amount of computing time. A manuscript describing the new method was submitted. See Section \ref{sec.new-tree-search} for more details.
\item Codon models: GY (Goldman \& Yang 1994), MG (Muse \& Gaut 1994), and ECM (Kosiol et al. 2007)
\item Morphological models: MK and ORDERED (Lewis 2001)
\item Ascertainment bias correction model (+ASC) for e.g., morphological or SNP data (Lewis 2001)
\item Nearest neighbor interchange with five (instead of one) branch optimization (\texttt{-nni5}) is now the default option because of its higher
accuracy
\item SH-aLRT branch test also works now for partition models.
\end{itemize}
%============================================%
\subsection{Key features}
\label{features}
IQ-TREE provides a lot of options for phylogenetic reconstruction. The main features include:
\begin{itemize}
\item Reconstruction of the maximum likelihood tree from sequence alignments \citep{vinh2004,minh2005a}.
\item Ultrafast bootstrap approximation for assessing branch supports \citep{minh2013}.
\item Various substitution models for binary, nucleotide, amino-acid with/without rate heterogeneity.
\item Partition models for phylogenomic data
\item Automatic selection of best-fit models similarly to ModelTest \citep{posada1998}.
\item Standard non-parametric bootstrap \citep{felsenstein1985}.
\item Single branch tests \citep[LBP, SH-aLRT; ][]{adachi1996b,guindon2010}.
\item Test of model homogeneity assumption along the tree \citep{weiss2003}.
\item Site-specific rate model \citep{meyer2003}.
\item Fast consensus tree reconstruction, Robinson-Foulds distance computation.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Installation}
\label{Installation}
See below for information how to install/build the different
versions of the IQ-TREE software. Executable versions of the sequential,
that is, non-parallel program are intended for a number of operating
systems.
\subsection{Binary release}
\begin{enumerate}
\item Download the executable version of IQ-TREE
for your operating system if it is available (\texttt{iqtree-XXX-OS.tar.gz}
or \texttt{iqtree-XXX-OS.zip}, where \texttt{XXX} is the current version number and
OS the operating system) from\\
\url{http://www.cibiv.at/software/iqtree}
\item Extract the files (e.g., with \texttt{tar xvzf iqtree-XXX-OS.tar.gz} under Unix).
This should create a directory \texttt{iqtree-XXX-OS}.
\item You will find the executable in \texttt{iqtree-XXX-OS/}.
This executable you should rename to \texttt{iqtree} (or \texttt{iqtree.exe}
on Windows systems) and copy it to your system search path
such that it is found by your system.
\end{enumerate}
\textbf{Note on multi--ompcore version:} The executable is named \texttt{iqtree-omp}
(or \texttt{iqtree-omp.exe} on Windows). Please also copy other files needed for OpenMP (e.g., \texttt{*.dll} on Windows)
to the same folder that you copied \texttt{iqtree-omp} to. Finally, for Mac OS X
you have to install MacPorts and the associated gcc47 to run \texttt{iqtree-omp} properly (see a how-to in section \ref{sec:build-openmp}).
If you encounter problems, please ask your local administrator for help.
\subsection{Building source package}
To build IQ-TREE from the sources you need a C++ compiler (e.g., gcc) and the CMake tool
installed (This is usually the case on UNIX/Linux systems. For
Windows you might want to obtain CygWin/MinWG/MS Visual C++ or XCode for MacOSX).
Then you can follow the procedure below:
\begin{enumerate}
\item Download the current version of the software (\texttt{iqtree-XXX-Source.tar.gz} or\\
\texttt{iqtree-XXX-Source.zip}, where \texttt{XXX} is the current version number) from\\
\url{http://www.cibiv.at/software/iqtree}
\item Extract the files (e.g., with \texttt{tar xvzf iqtree-XXX-Source.tar.gz} under Unix).
This should create a directory \texttt{iqtree-XXX-Source}.
\item Change into this directory.
\item Create a sub-directory \texttt{build} and go into this sub-directory by entering:
\begin{verbatim}
mkdir build
cd build
\end{verbatim}
\item Configure the source codes using CMake:
\begin{verbatim}
cmake ..
\end{verbatim}
\item Compile and build the source codes:
\begin{verbatim}
make
\end{verbatim}
This creates an executable \texttt{iqtree}
(or \texttt{iqtree.exe} on Windows systems). This executable can copied to your system search path
such that it is found by your system.
\end{enumerate}
If you encounter problems, please ask your local administrator for help.
\subsection{Building multi-core parallel version (\textcolor{red}{Update!})}
\label{sec:build-openmp}
To build the multi-core version you need a compiler that supports the OpenMP standard (e.g., gcc).
For Linux and Windows the gcc and MinGW compilers work just fine.
However, in our test on Mac OS X, IQ-TREE was successfully compiled with the default the XCode gcc
but the example run crashed for unknown reason.
Therefore, we employed MacPorts (with gcc47 or later) and successfully ran IQ-TREE compiled with MacPorts gcc. To this end, please first install
MacPorts, gcc in MacPorts (\texttt{sudo port install gcc47}) and configure gcc to point to the MacPorts' gcc
version (\texttt{sudo port select --set gcc mp-gcc47}).
The compilation then follows the same route with slightly changed command line for cmake:
\begin{verbatim}
cmake .. -DIQTREE_FLAGS="omp"
\end{verbatim}
All other commands remain the same. It is recommended to copy the executable file \textcolor{red}{\texttt{iqtree-omp}}
(or \texttt{iqtree-omp.exe} on Windows)
to the system search path such that one can simply run \texttt{iqtree-omp} from the command-line.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Tutorial}
\label{tutorial}
This section gives users a quick starting guide. You can either download the binary
for your platform from the IQ-TREE website or the source code. In the later case,
you will need to compile the source code (see \emph{Installation} section \ref{Installation}). For the next steps, the \texttt{iqtree}
executable should be then copied into the \texttt{bin} folder such that
IQ-TREE can be invoked by simply entering \texttt{'iqtree'} at the command-line.
You can run \texttt{'iqtree -h'} to see a list of options available in IQ-TREE.
%============================================%
\subsection{First running example(\textcolor{red}{Update!})}
From the download there is an example alignment called \texttt{example.phy}
in PHYLIP format (IQ-TREE also supports FASTA and NEXUS files). You can now start to reconstruct a maximum-likelihood tree
from this alignment by typing (assuming that you are now in the same folder with \texttt{example.phy}):
\begin{verbatim}
iqtree -s example.phy
\end{verbatim}
\texttt{'-s'} is the option to specify the name of the alignment file that is always required by
IQ-TREE to work. At the end of the run IQ-TREE will write several output files:
\begin{itemize}
\item \texttt{example.phy.iqtree}: the main report file that is self readable for users. You
should look at this file to see the results.
\item \texttt{example.phy.treefile}: the ML tree in NEWICK format, which can be visualized
by tree viewer tools such as FigTree, iTOL. Note that this newick tree is also embedded in
\texttt{example.phy.iqtree}.
%\item \texttt{example.phy.bionj}: the BIONJ tree in NEWICK format, which is used internally
%by IQ-TREE as a starting tree for the tree search procedure.
%\item \texttt{example.phy.jcdist}: the Juke-Cantor corrected distance matrix.
%\item \texttt{example.phy.mldist}: the ML distance matrix (based on the given substitution model).
\item \texttt{example.phy.log}: log file of the entire run (also printed on the screen). To report
bugs, please send this log file and the original alignment file to the authors.
\end{itemize}
Note that all output files have the default prefix as the alignment file name. You can always
change the prefix using the \texttt{'-pre'} option, e.g.:
\begin{verbatim}
iqtree -s example.phy -pre myprefix
\end{verbatim}
Then IQ-TREE will write output files \texttt{myprefix.iqtree, myprefix.treefile}, etc. This is
helpful when you do several runs for the same input.
\textcolor{red}{******** NEW IN VERSION 1.0 ********}
Since version 1.0 IQ-TREE by default offers a more accurate tree search and bootstrap by
optimizing five branches around the nearest neighbor interchanges (NNIs). This comes with a trade-off
of approximately 2X longer running time than 0.9.X version. To switch back to
old behaviour of optimizing one branch around NNIs, simply use the \texttt{-nni1} option:
\begin{verbatim}
iqtree -s example.phy -nni1
\end{verbatim}
%============================================%
\subsection{Choosing the substitution model}
IQ-TREE supports numerous substitution models for binary, DNA, and protein data and Gamma rate
heterogeneity model. If you do not specify, IQ-TREE will use the default HKY, WAG, and JC models for DNA, protein,
and binary alignments, respectively. For most data sets these models are too simplified.
If you have no idea about which model is appropriate for your data, let IQ-TREE automatically determine the best-fit model
for your alignment with:
\begin{verbatim}
iqtree -s example.phy -m TEST
\end{verbatim}
\texttt{'-m'} is the option to specify the model name to use during the analysis. \texttt{'TEST'}
is a key word telling IQ-TREE to first select the best-fit model. The remaining analysis
will be done using the selected model. More specifically, IQ-TREE computes the log-likelihoods
of the initial BIONJ tree for many different models and the Akaike information criterion (AIC),
corrected Akaike information criterion (AICc), and the Bayesian information criterion (BIC).
Then IQ-TREE chooses the model that minimizes the BIC score (you can also change to AIC or AICc by
adding the option "-AIC" or "-AICc", respectively).
Moreover, IQ-TREE will write an additional file:
\begin{itemize}
\item \texttt{example.phy.model}: log-likelihoods for all models tested.
\end{itemize}
If you now look at \texttt{example.phy.iqtree} you will see that IQ-TREE selected the model \texttt{'TIM'}
with \texttt{'Invar+Gamma'} rate heterogeneity. So \texttt{'TIM+I+G'} is the best-fit model
for this example data. From now on you can run with e.g.:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G
\end{verbatim}
Sometimes you only want to find the best-fit model without doing tree reconstruction, then run:
\begin{verbatim}
iqtree -s example.phy -m TESTONLY
\end{verbatim}
Here, IQ-TREE will stop after finishing the model selection. The name of the best-fit model will be printed on the screen.
Finally, note that IQ-TREE will check if the file \texttt{'*.model'} exists and is correct.
If so, it will automatically reuse the log-likelihoods computed to speed up the model selection procedure.
\textcolor{red}{********NEW********}
Since version 0.9.6 IQ-TREE offers the partition model selection for multi-gene analysis.
See Section \ref{sec.partition-model-selection} for more details.
%============================================%
\subsection{Support for phylogenetic likelihood library (\textcolor{red}{New in version 1.0!})}
In the major release 1.0, we added the support for PLL \citep{tomas2014} which helps to speed
up IQ-TREE by a factor of 2X to 8X. To test the new feature simply use option \texttt{'-pll'}, for example:
\begin{verbatim}
iqtree -s example.phy -pll -m GTR+G
\end{verbatim}
Here, we also specifies model GTR+G as it is currently supported by PLL. Note that this
option does not entirely work with other options yet (such as \texttt{'-m TEST'}). In such cases, an error message will be displayed.
%============================================%
\subsection{Novel tree search algorithm (\textcolor{red}{New in version 1.0!})}
\label{sec.new-tree-search}
IQ-TREE 1.0 implemented a new tree search algorithm, which explore the tree space much more
efficiently than version 0.9.X. Here, IQ-TREE combines parsimony analysis to provide better
starting trees, new stochastic algorithm to escape local optima, and new stopping rule. The new search strategies
come with a few parameters where the default values were tested to work well for many different data sets.
Moreover, you can change the default parameters with options:
\begin{verbatim}
-numpars <number> Number of initial parsimony trees (default: 100)
-toppars <number> Number of top initial parsimony trees (dfault: 20)
-numcand <number> Number of candidate trees during search (defaut: 5)
-pers <perturbation> Perturbation strength for stochastic NNI (default: 0.5)
-numstop <number> Number of unsuccessful iterations to stop (default: 100)
\end{verbatim}
Finally, you can still switch back to the old algorithm of 0.9.X by options:
\begin{verbatim}
-iqp Use IQP tree perturbation (default: sNNI)
-iqpnni Switch entirely to old IQPNNI algorithm
\end{verbatim}
%============================================%
\subsection{Codon models (\textcolor{red}{New in version 1.0!})}
IQ-TREE 1.0 supports basic codon models (GY, MG, and ECM). You need to input a protein-coding DNA alignment and specify codon data by option \texttt{'-st CODON'} (Otherwise, IQ-TREE applies DNA model because it detects that your alignment has DNA sequences):
\begin{verbatim}
iqtree -s coding_gene.phy -st CODON
\end{verbatim}
If your alignment length is not divisible by 3, an error message will occur. IQ-TREE will group sites 1,2,3 into codon site 1; site 4,5,6 to codon site 2; etc. Moreover, any codon, which has at least one gap/unknown/ambiguous nucleotide, will be treated unknown codon character.
If you are not sure which model to use, simply add \texttt{'-m TEST'}, which also works for codon alignments:
\begin{verbatim}
iqtree -s coding_gene.phy -st CODON -m TEST
\end{verbatim}
By default IQ-TREE uses the standard genetic code.
You can change to other genetic code (see \url{http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi}) with following options:
\begin{tabular}{ll}
\hline
Option & Genetic code\\
\hline
\texttt{-st CODON1} & The Standard Code (same as \texttt{-st CODON})\\
\texttt{-st CODON2} & The Vertebrate Mitochondrial Code\\
\texttt{-st CODON3} & The Yeast Mitochondrial Code\\
\texttt{-st CODON4} & The Mold, Protozoan, and Coelenterate Mitochondrial Code and \\
& the Mycoplasma/Spiroplasma Code\\
\texttt{-st CODON5} & The Invertebrate Mitochondrial Code\\
\texttt{-st CODON6} & The Ciliate, Dasycladacean and Hexamita Nuclear Code\\
\texttt{-st CODON9} & The Echinoderm and Flatworm Mitochondrial Code\\
\texttt{-st CODON10} & The Euplotid Nuclear Code\\
\texttt{-st CODON11} & The Bacterial, Archaeal and Plant Plastid Code\\
\texttt{-st CODON12} & The Alternative Yeast Nuclear Code\\
\texttt{-st CODON13} & The Ascidian Mitochondrial Code\\
\texttt{-st CODON14} & The Alternative Flatworm Mitochondrial Code\\
\texttt{-st CODON16} & Chlorophycean Mitochondrial Code\\
\texttt{-st CODON21} & Trematode Mitochondrial Code\\
\texttt{-st CODON22} & Scenedesmus obliquus Mitochondrial Code\\
\texttt{-st CODON23} & Thraustochytrium Mitochondrial Code\\
\texttt{-st CODON24} & Pterobranchia Mitochondrial Code\\
\texttt{-st CODON25} & Candidate Division SR1 and Gracilibacteria Code\\
\hline
\end{tabular}
%============================================%
\subsection{Morphological and SNP data (\textcolor{red}{New in version 1.0!})}
IQ-TREE 1.0 supports discrete morphological alignment by \texttt{'-st MORPH'} option:
\begin{verbatim}
iqtree -s morphology.phy -st MORPH
\end{verbatim}
IQ-TREE implements to two morphological ML models (MK and ORDERED; see Lewis 2001), where MK is the default model.
MK is a Juke-Cantor-like model. ORDERED model considers only transitions between states $i\rightarrow i-1$, $i\rightarrow i$, and $i \rightarrow i+1$. Morphological data typically do not have constant (uninformative) sites.
In such case, you should apply ascertainment bias correction model by e.g.:
\begin{verbatim}
iqtree -s morphology.phy -st MORPH -m MK+ASC
\end{verbatim}
You can again select best-fit model with \texttt{'-m TEST'} (which also consider +G):
\begin{verbatim}
iqtree -s morphology.phy -st MORPH -m TEST
\end{verbatim}
For SNP data (DNA) that typically do not contain constant sites, you can explicitly tell model to include
ascertainment bias correction:
\begin{verbatim}
iqtree -s SNP_data.phy -m GTR+ASC
\end{verbatim}
You can explicitly tell model testing to only include \texttt{'+ASC'} model with:
\begin{verbatim}
iqtree -s SNP_data.phy -m TEST+ASC
\end{verbatim}
%============================================%
\subsection{Assessing branch supports with ultrafast bootstrap approximation}
The ultrafast bootstrap approximation is the most value-added feature available in IQ-TREE. Simply run:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G -bb 1000
\end{verbatim}
\texttt{'-bb'} specifies the number of bootstrap replicates where $1000$
is the minimal number recommended. When you now look at the section \texttt{'MAXIMUM LIKELIHOOD TREE'}
in \texttt{example.phy.iqtree}, you will see that every internal node of the tree figure
will be associated a support value in percentage. Branch supports are assigned onto the ML tree and printed in
\texttt{example.phy.treefile} that can be viewed again in FigTree.
In addition, IQ-TREE writes the following files:
\begin{itemize}
\item \texttt{example.phy.contree}: the consensus tree with assigned branch supports where branch lengths
are optimized on the original alignment.
\item \texttt{example.phy.splits}: support values in percentage for all splits (bipartitions),
computed as the occurence frequencies in the bootstrap trees. This file is in "star-dot" format.
\item \texttt{example.phy.splits.nex}: has the same information as \texttt{example.phy.splits}
but in NEXUS format, which can be viewed with SplitsTree program.
\end{itemize}
%============================================%
\subsubsection{Assessing branch supports with standard nonparametric bootstrap}
The standard nonparametric bootstrap can be invoked by:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G -b 100
\end{verbatim}
\texttt{'-b'} specifies the number of bootstrap replicates where $100$
is the minimal number recommended. IQ-TREE will additionally writes the following files:
\begin{itemize}
\item \texttt{example.phy.boottrees}: the set of bootstrap trees reconstructed.
\item \texttt{example.phy.contree}: the bootstrap consensus tree with assigned branch supports where branch lengths
are optimized on the original alignment.
\end{itemize}
%============================================%
\subsubsection{Assessing branch supports with single branch tests}
IQ-TREE provides an implementation of the SH-like approximate likelihood ratio test \citep[SH-aLRT; ][]{guindon2010}.
To perform this test, simply run:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G -alrt 1000
\end{verbatim}
\texttt{'-alrt'} specifies the number of bootstrap replicates for SH-aLRT where $1000$ is
the minimal number recommended. IQ-TREE will perform SH-aLRT at the end of the tree reconstruction process
and assign support values onto the ML tree. The support values will be reflected in the tree file \texttt{example.phy.treefile}.
IQ-TREE also provides a fast implementation of the local bootstrap probabilities method \citep{adachi1996b},
which we call Fast-LBP. Fast-LBP computes the branch support by comparing the tree log-likelihood
with the log-likelihoods of the two alternative nearest-neighbor-interchange (NNI) trees around the branch of interest.
However, Fast-LBP is different from LBP where we compute the log-likelihoods of the two alternative NNI trees
by only reoptimizing five branches around the branch of interest (Similar idea is used in the SH-aLRT test).
To perform Fast-LBP, simply run:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G -lbp 1000
\end{verbatim}
You can also perform both tests:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G -alrt 1000 -lbp 1000
\end{verbatim}
The branches of the resulting ML tree will be assigned with both SH-aLRT and Fast-LBP support values.
Finally, you can also combine the ultrafast bootstrap approximation with single branch tests within one single run:
\begin{verbatim}
iqtree -s example.phy -m TIM+I+G -bb 1000 -alrt 1000 -lbp 1000
\end{verbatim}
%============================================%
\subsection{Partitioned analysis for multi-gene alignments}
\label{sec.partition-model}
In the partition model, you can specify a substitution model for each gene/character set individually.
IQ-TREE will then estimate the model parameters and branch lengths separately for every partition.
To this end, you have to first prepare a NEXUS file including a \texttt{SETS} block with
\texttt{CharSet} and \texttt{CharPartition} commands to specify individual genes and the partition, respectively.
For example:
\begin{verbatim}
#nexus
begin sets;
charset part1 = 1-100;
charset part2 = 101-384;
charpartition mine = HKY+G:part1, GTR+I+G:part2;
end;
\end{verbatim}
Now if you save this into a file \texttt{example.nex} and run:
\begin{verbatim}
iqtree -s example.phy -sp example.nex
\end{verbatim}
This means that IQ-TREE will partition the alignment \texttt{example.phy} into 2 subsets named \texttt{part1} and \texttt{part2}
containing sites (columns) 1-100 and 101-384, respectively. Moreover, IQ-TREE applies the
subtitution models \texttt{HKY+G} and \texttt{GTR+I+G} to \texttt{part1} and \texttt{part2}, respectively.
After the run has finished, the \texttt{example.nex.iqtree} file will contain substitution model
parameters, trees with branch lengths for all subsets in the partition.
Moreover, the \texttt{CharSet} command allows to specify non-consecutive sites using comma-separated list of ranges with e.g.:
\begin{verbatim}
charset part1 = 1-100 200-384;
\end{verbatim}
That means, \texttt{part1} contains sites 1-100 and 200-384 of the alignment. Another example is:
\begin{verbatim}
charset part1 = 1-100\3;
\end{verbatim}
for extracting sites 1,4,7,...,100 from the alignment. This is useful for getting codon positions from the protein-coding alignment.
Moreover, IQ-TREE allows a more advanced feature compared to other programs:
IQ-TREE allows different subsets coming from different alignments.
For example:
\begin{verbatim}
#nexus
begin sets;
charset part1 = part1.phy: 1-100\3 201-300\3;
charset part2 = part2.phy: 101-300;
charpartition mine = HKY:part1, GTR+G:part2;
end;
\end{verbatim}
Here, \texttt{part1} and \texttt{part2} are read from alignment files \texttt{part1.phy} and \texttt{part2.phy}, respectively
(a ':' is needed to separate the alignment file name and site specification). Because the alignment file names
were embedded in this NEXUS file, you can simply run:
\begin{verbatim}
iqtree -sp example.nex
\end{verbatim}
Note that
\texttt{part1.phy} and \texttt{part2.phy} need not contain the same set of sequence names. That means, if some sequence occurs
in \texttt{part1.phy} but not in \texttt{part2.phy}, IQ-TREE will treat corresponding part of sequence
in \texttt{part2.phy} as missing data. For your convenience IQ-TREE writes the concatenated alignment
into the file \texttt{example.nex.conaln}.
\textcolor{red}{********NEW EXPERIMENTAL FEATURE********}
Since version 0.9.6 IQ-TREE supports partition models with joint and proportional branch lengths between genes. This is
to reduce the number of parameters in case of model overfitting for the full partition model. For example:
\begin{verbatim}
iqtree -spp example.nex
\end{verbatim}
applies a proportional partition model. That means, we have only one set of branch lengths for species tree
but allow each gene to evolve under a specific rate (scaling factor) normalized to the average of 1.
A partition model with joint branch lengths is specified by:
\begin{verbatim}
iqtree -spj example.nex
\end{verbatim}
(i.e., all gene-specific rates are equal to 1).
%============================================%
\subsubsection{Choosing the right partitioning scheme}
\label{sec.partition-model-selection}
Since version 0.9.6 IQ-TREE implements a greedy strategy \citep{lanfear2012} that starts with the full partition model and sequentially
merges two genes until the model fit does not increase any further:
\begin{verbatim}
iqtree -sp example.nex -m TESTLINK
\end{verbatim}
After the best partition is found IQ-TREE will immediately start the tree reconstruction under the best-fit partition model.
Sometimes you only want to find the best-fit partition model without doing tree reconstruction, then run:
\begin{verbatim}
iqtree -sp example.nex -m TESTONLYLINK
\end{verbatim}
%============================================%
\subsubsection{Bootstrapping with partition model}
IQ-TREE can perform the ultrafast bootstrap with partition models by e.g.,
\begin{verbatim}
iqtree -sp example.nex -bb 1000
\end{verbatim}
Here, IQ-TREE will resample the sites \emph{within} subsets of the partitions (i.e.,
the bootstrap replicates are generated per subset separately and then concatenated together).
The same holds true if you do the standard nonparametric bootstrap.
\textcolor{red}{********NEW********}
Since version 0.9.6 IQ-TREE supports the gene-resampling strategy:
\begin{verbatim}
iqtree -sp example.nex -bb 1000 -bspec GENE
\end{verbatim}
is to resample genes instead of sites. Moreover, IQ-TREE allows an even more complicated
strategy: resampling genes and sites within resampled genes:
\begin{verbatim}
iqtree -sp example.nex -bb 1000 -bspec GENESITE
\end{verbatim}
%============================================%
\subsection{Utilizing multi-core CPUs}
A specialized version of IQ-TREE allows users to perform the analysis
that utilizes multiple cores during the run (made possible by the OpenMP library).
You can download the binary from the software website or compile the source code
yourself (see \emph{Installation} section \ref{Installation}). For the following please
copy the binary \texttt{iqtree-omp} and other files in the package bin folder into the system \texttt{bin} folder such that it can be
invoked from the command-line by simply running the command \texttt{iqtree-omp}.
If you now run with e.g.:
\begin{verbatim}
iqtree-omp -s example.phy
\end{verbatim}
Then IQ-TREE will use all the available cores of your CPU.
This might not be a good practice because our parallelization technique only works well on long alignments.
If you have a very short alignment, it is not recommended to use this IQ-TREE version.
Because the speedup gain depends on the alignment length,
a good practice is to run this version with increasing number of cores by e.g.:
\begin{verbatim}
iqtree-omp -s example.phy -omp 2
\end{verbatim}
Here, \texttt{-omp} is the option to specify the number of cores that IQ-TREE will use.
If you see that the wall-clock time reduction is substantial compared with the sequential IQ-TREE version,
then you can try:
\begin{verbatim}
iqtree-omp -s example.phy -omp 3
\end{verbatim}
and so on, until no substantial reduction of running time is observed. The remaining
analysis can then be carried out with that number of cores.
For example, on my computer (Linux, Intel Core i5-2500K, 3.3 GHz, quad cores) I observed the following
wall-clock running time for this example alignment:
\begin{center}
\begin{tabular}{cc}
\hline
No. cores & Wall-clock time\\
\hline
1 & 21.465 sec.\\
2 & 13.627 sec.\\
3 & 11.119 sec.\\
4 & 10.807 sec.\\
\hline
\end{tabular}
\end{center}
Therefore, I would only use 2 cores for this specific alignment (\texttt{"-omp 2"} option).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Advanced tutorial}
\label{sec.advanced-tutorial}
This section gives an advanced tutorial for more experienced users. It includes several advanced features
like tree topology test, user-defined substitution models.
%============================================%
\subsection{Tree topology tests}
IQ-TREE can compute log-likelihoods of user-defined trees passed via \texttt{-z} option:
\begin{verbatim}
iqtree -s example.phy -z example.treels
\end{verbatim}
assuming that \texttt{example.treels} contains the trees in NEWICK format.
At the end of the usual run, IQ-TREE will additionally evaluate all trees in there using the estimated model parameters.
When you look into \texttt{example.phy.iqtree}
there will be a section \texttt{USER TREES} that lists the tree IDs and the corresponding log-likelihoods.
Moreover, IQ-TREE will additionally write a file:
\begin{itemize}
\item \texttt{example.phy.treels.trees}: the trees with optimized branch lengths.
\end{itemize}
If you only want to evaluate the trees without reconstructing the ML tree, you can run:
\begin{verbatim}
iqtree -s example.phy -z example.treels -n 1
\end{verbatim}
Here, IQ-TREE will only reconstruct the BIONJ+NNI tree and use that tree to estimate the model parameters,
which are normally accurate enough for our purpose.
IQ-TREE also supports several tree topology tests using the RELL approximation \citep{kishino1990}
including: bootstrap proportion (BP), Kishino-Hasegawa test \citep[KH; ][]{kishino1989}, Shimodaira-Hasegawa test \citep[SH; ][]{shimodaira1999}, expected likelihood weights \citep[ELW; ][]{strimmer2002}, weighted-KH (WKH), and weighted-SH (WSH) tests.
The trees are passed via \texttt{-z} option, thus you can run:
\begin{verbatim}
iqtree -s example.phy -z example.treels -n 1 -zb 1000
\end{verbatim}
Here, \texttt{-zb} specifies the number of RELL replicates, where 1000 is the minimum number recommended.
The \texttt{USER TREES} section of \texttt{example.phy.iqtree} will list the results of BP, KH, SH, and ELW methods. If you want to
also perform the WKH and WSH, simply add \texttt{-zw} option:
\begin{verbatim}
iqtree -s example.phy -z example.treels -n 1 -zb 1000 -zw
\end{verbatim}
Finally, note that IQ-TREE will automatically detect duplicated tree topologies and omit them during the evaluation.
%============================================%
\subsection{User-defined substitution models}
Users can specify an arbitrary DNA models using a 6-letter specification that constrains which rates to be equal.
For example, \texttt{010010} corresponds to the HKY model and \texttt{012345} the GTR model.
In fact, the IQ-TREE source code internally uses this specification to simplify the coding. The 6-letter code is specified
via -m option, e.g.:
\begin{verbatim}
iqtree -s example.phy -m 010010+G
\end{verbatim}
Moreover, with -m option one can input a file name which contains the 6 rates (A-C, A-G, A-T, C-G, C-T, G-T)
and 4 base frequencies (A, C, G, T), e.g.:
\begin{verbatim}
iqtree -s example.phy -m mymodel+G
\end{verbatim}
where \texttt{mymodel} is a file containing the 10 entries described above. One can even specify the rates within -m option by e.g.:
\begin{verbatim}
iqtree -s example.phy -m 'TN{2.0,3.0}+G8{0.5}+I{0.15}'
\end{verbatim}
That means, we use Tamura-Nei model with fixed transition-transversion rate ratio of 2.0 and purine/pyrimidine rate ratio of 3.0. Moreover, we
use an 8-category Gamma-distributed site rates with the shape parameter (alpha) of 0.5 and a proportion of invariable sites p-inv=0.15.
Note that by default IQ-TREE computes empirical state frequencies from the alignment, but one can also optimize the frequencies by maximum-likelihood
with \texttt{+Fo} in the model name:
\begin{verbatim}
iqtree -s example.phy -m GTR+G+Fo
\end{verbatim}
For amino-acid alignments, if one wants to use the frequencies of the empirical protein model, then use \texttt{+Fu}, for example:
\begin{verbatim}
iqtree -s myprotein_alignment -m WAG+G+Fu
\end{verbatim}
Finally, note that all model specifications above can be used in the partition model NEXUS file.
%============================================%
\subsection{Consensus construction and bootstrap value assignment}
IQ-TREE can construct an extended majority-rule consensus tree from a set of trees written in NEWICK or NEXUS format (e.g., produced
by MrBayes):
\begin{verbatim}
iqtree -con mytrees
\end{verbatim}
To build a majority-rule consensus tree, simply set the minimum support threshold to 0.5:
\begin{verbatim}
iqtree -con mytrees -t 0.5
\end{verbatim}
If you want to specify a burn-in (the number of beginning trees to ignore from the trees file), use -bi option:
\begin{verbatim}
iqtree -con mytrees -t 0.5 -bi 100
\end{verbatim}
to skip the first 100 trees in the file.
IQ-TREE can also compute a consensus network and print it into a NEXUS file by:
\begin{verbatim}
iqtree -net mytrees
\end{verbatim}
Finally, an useful feature is to read in an input tree and a set of trees, then IQ-TREE can assign the
support value onto the input tree (number of times each branch in the input tree occurs in the set of trees) by:
\begin{verbatim}
iqtree -sup input_tree set_of_trees
\end{verbatim}
%============================================%
\subsection{Computing Robinson-Foulds distance between trees}
IQ-TREE implements a very fast Robinson-Foulds (RF) distance computation using hash table, which is a lot faster than PHYLIP package. For example, you can run:
\begin{verbatim}
iqtree -rf tree_set1 tree_set2
\end{verbatim}
to compute the pairwise RF distances between 2 sets of trees. If you want to compute the all-to-all RF distances
of a set of trees, use:
\begin{verbatim}
iqtree -rf_all tree_set
\end{verbatim}
%============================================%
\subsection{Generating random trees}
IQ-TREE provides several random tree generation models. For example,
\begin{verbatim}
iqtree -r 100 100.tree
\end{verbatim}
is to generate a 100-taxon random tree into the file \texttt{100.tree} under the Yule Harding model,
where the branch lengths follow an exponential distribution with mean of 0.1.
If you want to change the branch length distribution, run e.g:
\begin{verbatim}
iqtree -r 100 -rlen 0.05 0.2 0.3 100.tree
\end{verbatim}
to set the minimum, mean, and maximum branch lengths as 0.05, 0.2, and 0.3, respectively.
If you want to generate trees under uniform model instead, use '-ru' option:
\begin{verbatim}
iqtree -ru 100 100.tree
\end{verbatim}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Frequently asked questions (FAQ)}
\subsection{How does IQ-TREE treat gap/missing characters?}
Gaps (-) and missing characters (? or N for DNA alignments) are treated in the same way as \emph{unknown} characters,
which represent no information. The same treatment holds for many other ML software (RAxML, PhyML, etc.). Technically
in the Felsenstein's pruning algorithm we fill a partial likelihood vector of all 1's for all character states. This is the same as follows.
For a site (column) of an alignment containing AC-AG-A (i.e. A for sequence 1, C for sequence 2, - for sequence 3,...), the site-likelihood
of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters:
\[ \ell(T | AC-AG-A) = \ell(T_{sub} | ACAGA) \]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Version History}
\label{Version History}
\begin{description}
\item \textbf{Version 0.9.6:} October 2013
\begin{itemize}
\item Ultrafast model selection and partitioning for phylogenomic alignments.
\item Introduction of nearest neighbor interchange (NNI) with five branch optimization to evaluate candidate NNIs.
This will bring higher accuracy for tree reconstruction and bootstrap with a tradeoff of c.a. 2X longer running time.
\item Introduction of joint and proportional partition models to reduce the number of parameters in case of model overfitting (experimental).
\item Introduction of gene-resampling and gene-and-site resampling for the bootstrap on multi-gene alignments.
\end{itemize}
\item \textbf{Version 0.9.5:} May 2013
\begin{itemize}
\item Introduction of bootstrap epsilon to select equally good bootstrap trees at random to deal with polytomies
\end{itemize}
\item \textbf{Version 0.9.4:} Easter 2013
\begin{itemize}
\item Tree topology tests
\end{itemize}
\item \textbf{Version 0.9.3:} March 2013
\begin{itemize}
\item New implementation of model selection that works on all data types.
\item A tutorial about using partition models.
\item Parallel OpenMP support to utilize multi-core CPUs.
\end{itemize}
\item \textbf{Version 0.9.0:} September 2012 -
First beta release.
\end{description}
\section*{Credits and Acknowledgement}
Some parts of the code were taken from the following packages/libraries: Phylogenetic likelihood library \citep{tomas2014}, TREE-PUZZLE \citep{schmidt2002},
BIONJ \citep{gascuel1997}, Nexus Class Libary \citep{lewis2003}, Eigen library \citep{guennebaud2010},
SPRNG library \citep{mascagni2000}, Zlib library (\url{http://www.zlib.net}).
Financial supports from the Austrian Science Fund (FWF), the Vienna Science and Technology Fund (WWTF), and the University of Vienna are greatly appreciated.
\bibliographystyle{bioinformatics}
\bibliography{genephylo,heiko} %%%%%%%%%%%
\end{document}
|