1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
|
LAGAN tools README (Authors: Michael Brudno, Michael F. Kim & Chuong Do)
lagan@cs.stanford.edu 04/02/2003
This document describes how to use LAGAN associated wrappers and tools.
Both mrun.pl and mrunpairs.pl are wrappers to mlagan. The only
difference is that mrunpairs.pl generates a set of pairwise
alignments, whereas mrun.pl does the standard multiple alignment.
Both of these tools use a helper script mextract.pl to parse out the
individual sequence files from a Multi-FASTA file.
Having run MLAGAN, we can visualize the output on a nucleotide level
in a "pretty" format using mpretty.pl. We can also project the
multiple sequence alignment into any number of its constituent
sequences, using mproject.pl. We provide a tool (mviz.pl) which will
take a multiple alignment in Multi-FASTA form and create a VISTA plot.
Using the parameter file, you can completely specify the parameters to
an mlagan job. We provide a sample file (sample.params) with more
information on how to use the various parameters.
Sequence names are always taken to be the first white-space terminated
string after the ">" in a FASTA or Multi-FASTA file, e.g.:
>sample1 This is the first sample sequence.
ACGT...
>sample2 This is the second sapmle sequence.
ACGT...
Here the sequence names would be sample1 and sample2.
The scorealign tool scores an alignment (multiple or pairwise in MFA format). The rc script
reverse-complements a sequence, and the bin2mf, mf2bin.pl and bin2bl scripts convert between the
various output formats.
mrunfile.pl
-----------
Usage:
mrunfile.pl filename [-pairwise] [-vista]
Required Parameter:
filename : name of the parameter file (e.g. sample.params)
Optional parameters:
-pairwise : generates a set of pairwise alignments
-vista : creates a VISTA plot using the output
Example:
mrunfile.pl sample.params -vista
This would run MLAGAN using the parameters in sample.params and
generate a VISTA plot at the end.
Uses:
mrun.pl or mrunpairs.pl
mrun.pl
-------
Usage:
mrun.pl filename -tree "(tree...)"
Required parameters:
filename : name of the Multi-FASTA file with the sequences to align.
-tree "(tree)" : a fully parenthesized phylogenetic tree over the
sequence names.
Optional parameters:
[base sequence name [sequence pairs]] : For projection into pairs for
VISTA output, you may wish to specify a base sequence and specific
pairs of sequences to have projected. If you do not specify sequence
pairs, then all possible pairings to the base sequence will be
generated. If you do not specify a base sequence, the default base
sequence is the first sequence in the multi-FASTA input.
other MLAGAN parameters:
-nested : runs iterative improvement in a nested fashion
-postir : incorporates the final improvement phase
-lazy : uses lazy mode for anchor generation
-verbose : give verbose output
-translate : do translated comparisons
-out "filename": outputs to filename
-version : prints version info
other VISTA parameters:
(see VISTA plotfile definition for more info)
per sequence pair:
--regmin # (default: 75)
--regmax # (default: 100)
--min # (default: 50)
per plotfile:
--bases # (default: 10000)
--tickdist # (default: 2000)
--resolution # (default: 25)
--window # (default: 40)
--numwindows # (default: 4)
Example:
mrun.pl sample.fasta -tree "(sample1 (sample2 sample3))"
This will run mlagan on the sequences in sample.fasta with the
phylogenetic tree specified above.
Uses:
mextract.pl to parse out the constituent sequences into individual
FASTA files for use by mlagan. Also uses mextract.pl with -masked
option for parsing out .masked multi-FASTA files.
mrunpairs.pl
------------
Usage:
mrunpairs.pl filename
Required parameter:
filename : multi-FASTA file.
Optional parameters:
(same as mrun.pl optional parameters, see above)
Example:
mrunpairs.pl sample.fasta sample1 sample1 sample2 sample1 sample3
This will generate the pairs (sample1 sample2), (sample1 sample3),
using sample1 as a base sequence (for VISTA plots).
Uses:
mextract.pl to parse out the constituent sequences into individual
FASTA files for use by mlagan. Also uses mextract.pl with -masked
option for parsing out .masked multi-FASTA files.
mpretty.pl
----------
Usage:
mpretty.pl filename
Required parameter:
filename : Multi-FASTA file to view.
Optional parameters:
-linelen value : number of bases to display per line
(min: 10, default: 50)
-interval value : frequency of markers
(min: 10, default: 10, none: 0)
-labellen value : length of the sequence label
(min: 5, default: 5, none: 0)
-start value : position to start from (>=1)
-end value : position to end from (>=start position)
-base sequence_name : sequence name on which to base start/end positions.
-nocounts : turn off sequence position counts
Example:
mpretty.pl sample.fasta -nocounts -interval 0 -linelen 72
This will print out the contents of sample.fasta without sequence
position counters, without interval markers and at 72 bases per line,
with the sequence labels on each line at their default length.
Because of the way the labels are printed, this will cause each line
to have length 80 characters.
mpretty.pl sample.fasta -start 101 -end 150
This will print out the contents of sample.fasta from positions 101 to
positions 150 in the alignment, inclusive.
mpretty.pl sample.fasta -start 131 -end 140 -base sample1_aligned
This will print out the contents of sample.fasta from position 131 to
position 140 relative to the sequence sample1_aligned.
mextract.pl
-----------
Usage:
mextract.pl filename [-masked]
Required parameter:
filename : Multi-FASTA file to extract sequences from.
Optional parameter:
-masked : For dealing with masked Multi-FASTA files.
Example:
mextract.pl sample.fasta
This will extract the contents of sample.fasta (e.g. sample1, sample2,
sample3) and put them into files:
sample_sample1.fa
sample_sample2.fa
sample_sample3.fa
Masked Example:
mextract.pl sample.fasta.masked -masked
This will extract the contents of sample.fasta.masked (e.g. sample1, sample2,
sample3) and put them into files:
sample_sample1.fa.masked
sample_sample2.fa.masked
sample_sample3.fa.masked
For use with rechaos.pl in anchoring.
mproject.pl
-----------
Usage:
mproject.pl filename seqname1 [seqname2 ... ]
Required parameters:
filename : Multi-FASTA file to extract sequences from.
and at least one sequence name.
Example:
mproject.pl sample.out sample1 sample2
In this example, sample.out is the resulting alignment of a number of
sequences -- including sample1 and sample2. This script will project
the multiple alignment into the pair sample1 and sample2.
mviz.pl
-------
Usage:
mviz.pl data_file param_file [plotfile]
Required parameters:
data_file : Multi-FASTA file to visualize using VISTA
(this must be the first argument)
param_file : Parameter file (same format as used in other scripts)
(this must be the second argument)
Optional parameter:
plotfile : VISTA plotfile (if specified, must be specified third)
Script will use this plotfile instead of automatically
generated one.
Example:
mviz.pl sample.out sample.params sample.plotfile
This will generate a VISTA plot using the data in sample.out, the
settings in sample.params, but with sample.plotfile as the given
plotfile.
Uses:
RunVista
scorealign
----------
Usage:
scorealign mfa_alignment %cutoff [-regions]
Optional parameters:
regions: Print the high scoring regions in the alignment.
Example:
scorealign alignment.mfa 80
This will return the score of the alignment in the file
"alignment.mfa" that meat an 80% threshold.
scorealign
----------
Usage:
scorealign mfa_alignment %cutoff [-regions]
Optional parameters:
regions: Print the high scoring regions in the alignment.
Example:
scorealign alignment.mfa 80
This will return the score of the alignment in the file
"alignment.mfa" that meat an 80% threshold.
mf2bin.pl
---------
Usage:
mf2bin.pl inputfile [-out outputfile]
Required parameter:
inputfile : Multi-FASTA file with two sequences to convert to bin.
Optional parameter:
-out outputfile : Put bin output to ouputfile.
Example:
mf2bin.pl sample1_sample2.fa -out sample1_sample2.bin
This will take the file sample1_sample2.fa (which contains the
alignment or projection of a larger alignment of sample1 and sample2)
and pack it into VISTA binary format and output the result to
sample1_sample2.bin.
bin2mf
------
Usage:
bin2mf { - | alignment_file}
Example
bin2mf align.bin > align.mfa
cat align.bin | bin2mf - > align.mfa
This will convert the binary file in align.bin into multi-fasta format,
and save it as align.mfa.
bin2bl
------
Usage:
bin2mf { - | alignment_file}
Example
bin2mf align.bin > align.bl
cat align.bin | bin2mf - > align.bl
This will convert the binary file in align.bin into BLAST-like format,
and save it as align.bl.
|