1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
|
.TH PHYLOBOOT "1" "May 2016" "phyloBoot 1.4" "User Commands"
.SH NAME
phyloBoot \- Generate simulated alignment data by parametric or nonparametric
.SH DESCRIPTION
Generate simulated alignment data by parametric or nonparametric
bootstrapping, and/or estimate errors in phylogenetic model parameters.
When estimating errors in parameters, the tree topology is not inferred
\fB\-\-\fR estimated errors are conditional on the given topology.
.PP
If a model is given in the form of a .mod file (<model_fname>), then
parametric bootstrapping is performed \fB\-\-\fR i.e., synthetic data sets are
drawn from the distribution defined by the model. Otherwise, the input
file is assumed to be a multiple alignment, and non\-parametric
bootstrapping is performed \fB\-\-\fR i.e., sites are drawn (with replacement)
from the empirical distribution defined by the given alignment.
.PP
The default behavior is to produce simulated alignments, estimate model
parameters for each one, and then write a table to stdout with a row
for each parameter and columns for the mean, standard deviation
(approximate standard error), median, minimum, and maximum of estimated
values, plus the boundaries of 95%% and 90%% confidence intervals.
.PP
The \fB\-\-alignments\-only\fR option, however, allows the parameter estimation
step to be bypassed entirely, and the program to be used simply to
generate simulated data sets.
See usage for phyloFit for additional details on tree\-building
options.
.SH EXAMPLE
.PP
(See below for more details on options)
.PP
1. Estimation of parameter errors by parametric bootstrapping.
.IP
phyloBoot \fB\-\-nreps\fR 500 \fB\-\-nsites\fR 10000 mymodel.mod > par_errors
.PP
2. Estimation of parameter errors by nonparametric bootstrapping.
.IP
phyloBoot \fB\-\-nreps\fR 500 \fB\-\-nsites\fR 10000
\fB\-\-tree\fR "((human,chimp),(mouse,rat))" myalignment.fa >
nonpar_errors
.PP
3. Parametric generation of simulated data.
.IP
phyloBoot mymodel.mod \fB\-\-alignments\-only\fR pardata
\fB\-\-nreps\fR 500 \fB\-\-nsites\fR 10000
.PP
4. Nonparametric generation of simulated data.
.IP
phyloBoot myalignment.fa \fB\-\-alignments\-only\fR nonpardata
\fB\-\-nreps\fR 500 \fB\-\-nsites\fR 10000
.SH OPTIONS
.SS bootstrapping options
.HP
\fB\-\-nsites\fR, \fB\-L\fR <number>
Number of sites in sampled alignments.
If an alignment is
.IP
given (non\-parametric case), default is number of sites in
alignment, otherwise default is 1000.
.HP
\fB\-\-nreps\fR, \fB\-n\fR <number>
Number of replicates.
Default is 100.
.HP
\fB\-\-msa\-format\fR, \fB\-i\fR FASTA|PHYLIP|MPM|MAF|SS
.TP
(non\-parametric case only)
Alignment format. Default is to guess
format from file contents.
.HP
\fB\-\-alignments\-only\fR, \fB\-a\fR <fname_root>
Generate alignments and write them to files with given filename
root, but do not estimate parameters.
.HP
\fB\-\-dump\-mods\fR, \fB\-d\fR <fname_root>
.IP
Dump .mod files for individual estimated models (one for each
replicate).
.HP
\fB\-\-dump\-samples\fR, \fB\-m\fR <fname_root>
.IP
Dump simulated alignments to files with given filename root.
Similar to \fB\-\-alignments\-only\fR but does not disable parameter
estimation.
.HP
\fB\-\-dump\-format\fR, \fB\-o\fR FASTA|PHYLIP|MPM|SS.
.IP
(For use with \fB\-\-alignments\-only\fR or \fB\-\-dump\-samples\fR) File format to
use when dumping raw alignments. Default FASTA.
.HP
\fB\-\-read\-mods\fR, \fB\-R\fR <fname_list>
Read estimated models from list of filenames instead of generating
alignments and estimating parameters. fname_list can be commadelimited list of files, or, if preceded by a '*', the name of a
file containing the file names (one per line). Can be used to compute
statistics for replicates that have been processed separately (see
\fB\-\-alignments\-only\fR). When this option is used, the primary argument
to the program (<model_fname>|<msa_fname>) will be ignored.
.HP
\fB\-\-output\-average\fR, \fB\-A\fR <fname>
Output a tree model representing the average of all input
models to the specified file.
.HP
\fB\-\-quiet\fR, \fB\-q\fR
.IP
Proceed quietly.
.HP
\fB\-\-help\fR, \fB\-h\fR
Print this help message.
.SS tree\-building options
.HP
\fB\-\-tree\fR, \fB\-t\fR <tree_fname>|<tree_string>
(Required if non\-parametric and more than two species) Name
of file or literal string defining tree topology.
.HP
\fB\-\-subst\-mod\fR, \fB\-s\fR JC69|F81|HKY85|REV|SSREV|UNREST|R2|R2S|U2|U2S|R3|R3S|U3|U3S
.TP
(default REV).
Nucleotide substitution model.
.HP
\fB\-\-nrates\fR, \fB\-k\fR <nratecats>
(default 1).
Number of rate categories to use. Specifying a
.IP
value of greater than one causes the discrete gamma model for
rate variation to be used.
.HP
\fB\-\-EM\fR, \fB\-E\fR
Use EM rather than the BFGS quasi\-Newton algorithm for parameter
estimation.
.HP
\fB\-\-precision\fR, \fB\-p\fR HIGH|MED|LOW
.IP
(default HIGH) Level of precision to use in estimating model
parameters.
.HP
\fB\-\-init\-model\fR, \fB\-M\fR <mod_fname>
.IP
Initialize optimization procedure with specified tree model.
.HP
\fB\-\-init\-random\fR, \fB\-r\fR
.IP
Initialize parameters randomly.
.HP
\fB\-\-scale\fR,\-P <rho>
Scale input tree by factor rho before doing parametric simulations.
.HP
\fB\-\-subtree\fR,\-S <node>
For use with \fB\-\-subtree\-scale\fR and/or subtree\-switch.
Define
.IP
subtree including all children of named node, including branch
leading up to node.
.HP
\fB\-\-subtree\-scale\fR,\-l <lambda>
Scale subtree defined with \fB\-\-subtree\fR option by factor lambda.
.HP
\fB\-\-subtree\-switch\fR,\-w <prob>
.IP
With given probability, randomly switch branches in tree from
subtree to supertree and vice versa. Randomization is performed
independently for each branch in every column of simulated data.
.HP
\fB\-\-scale\-file\fR,\-F <file>
(For use with \fB\-\-subtree\fR in parametric mode).
Instead of using
.HP
\fB\-\-subtree\-scale\fR or \fB\-\-scale\fR, read in a tab\-delimited file with
three columns: numSite,scale,subtree_scale.
For each row in the
file phyloBoot will simulate the given number of sites with those
scaling factors, and then will move on to the next row, so that the
total number of sites is the sum of the first column.
|