1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196
|
<?xml version='1.0' encoding='ISO-8859-1'?>
<?xml-stylesheet type="text/xsl"
href="http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
<!--
Process this file with an XSLT processor: `xsltproc \
-''-nonet /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/\
manpages/docbook.xsl manpage.dbk'. A manual page
<package>.<section> will be generated. You may view the
manual page with: nroff -man <package>.<section> | less'. A
typical entry in a Makefile or Makefile.am is:
DB2MAN=/usr/share/sgml/docbook/stylesheet/xsl/nwalsh/\
manpages/docbook.xsl
XP=xsltproc -''-nonet
manpage.1: manpage.dbk
$(XP) $(DB2MAN) $<
The xsltproc binary is found in the xsltproc package. The
XSL files are in docbook-xsl. Please remember that if you
create the nroff version in one of the debian/rules file
targets (such as build), you will need to include xsltproc
and docbook-xsl in your Build-Depends control field.
-->
<!-- Fill in your name for FIRSTNAME and SURNAME. -->
<!ENTITY dhfirstname "<firstname>David</firstname>">
<!ENTITY dhsurname "<surname>Paleino</surname>">
<!-- Please adjust the date whenever revising the manpage. -->
<!ENTITY dhdate "<date>april 25, 2007</date>">
<!-- SECTION should be 1-8, maybe w/ subsection other parameters are
allowed: see man(7), man(1). -->
<!ENTITY dhsection "<manvolnum>1</manvolnum>">
<!ENTITY dhemail "<email>d.paleino@gmail.com</email>">
<!ENTITY dhusername "David Paleino">
<!ENTITY dhucpackage "<refentrytitle>proda</refentrytitle>">
<!ENTITY dhpackage "proda">
<!ENTITY debian "<productname>Debian</productname>">
<!ENTITY gnu "<acronym>GNU</acronym>">
<!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
]>
<refentry>
<refentryinfo>
<address>
&dhemail;
</address>
<copyright>
<year>2007</year>
<holder>&dhusername;</holder>
</copyright>
&dhdate;
</refentryinfo>
<refmeta>
&dhucpackage;
&dhsection;
</refmeta>
<refnamediv>
<refname>&dhpackage;</refname>
<refpurpose>multiple alignment of protein sequences with repeats and rearrangements</refpurpose>
</refnamediv>
<refsynopsisdiv>
<cmdsynopsis>
<command>&dhpackage;</command>
<arg><option><replaceable>option</replaceable></option></arg>
<arg><replaceable>mfafile</replaceable></arg>
<arg>> <replaceable>output</replaceable></arg>
</cmdsynopsis>
</refsynopsisdiv>
<refsect1>
<title>DESCRIPTION</title>
<para>This manual page documents briefly the <command>proda</command> command.</para>
<para><command>proda</command> (Protein Domain Aligner) is public domain software for generating multiple alignments of protein sequences with repeats and rearrangements, e.g. proteins with multiple domains.</para>
<para>Given a set of protein sequences as input, ProDA first finds local pairwise alignments between all pairs of sequences, then forms blocks of alignable sequence fragments, and finally generates multiple alignments of the blocks. ProDA relies on many techniques used in <command>probcons</command> (<http://probcons.stanford.edu>), a recent multiple aligner that shows high accuracy in a number of popular benchmarks.</para>
</refsect1>
<refsect1>
<title>INPUT FORMAT</title>
<para>Proda accepts input files in the MFA format. The MFA format is specified below:</para>
<itemizedlist>
<listitem>
<para>the MFA format consists of multiple sequences;</para>
</listitem>
<listitem>
<para>each sequence in the MFA format begins with a single-line description, followed by lines of sequence data;</para>
</listitem>
<listitem>
<para>the description line is distinguished from the sequence data by a greater-than (">") symbol in the first column.</para>
</listitem>
</itemizedlist>
</refsect1>
<refsect1>
<title>OUTPUT FORMAT</title>
<para>For a set of input sequences, Proda usually outputs several blocks in turn, each consists of alignable sequence fragments. Each block is followed by its multiple alignment.</para>
<para>A block is specified by listing its sequence fragments. Each fragment is output as sequence_name(start-end), where sequence_name is the name of the original sequence and start and end are positions at which the fragment begins and ends respectively.</para>
<para>Proda produces block alignments in the ClustalW (ALN) format described below:</para>
<itemizedlist>
<listitem>
<para>the ClustalW format consists of a single header line followed by sequence data in blocks of 50 alignment positions;</para>
</listitem>
<listitem>
<para>each block consists of:</para>
<itemizedlist>
<listitem>
<para>one line of data for each of the sequences in the alignment - in particular, the name of the sequence;</para>
</listitem>
<listitem>
<para>50 characters of the alignment;</para>
</listitem>
<listitem>
<para>one annotation line indicating fully conserved (*), strongly-conserved (:), or weakly-conserved columns (.);</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>the description line is distinguished from the sequence data by a greater-than (">") symbol in the first column.</para>
</listitem>
</itemizedlist>
<refsect2>
<title>FASTA format for output</title>
<para>If the -fasta option is specified, then, in addition to regular output, ProDA produces a file containing block alignments in the FASTA format. The output file has the same name as the first input file and extension ".fasta". Two consecutive block alignments are separated by a line containing character '#'.</para>
<para>The FASTA format is described below:</para>
<itemizedlist>
<listitem>
<para>the FASTA format consists of all the sequences given in the input files;</para>
</listitem>
<listitem>
<para>each sequence in the FASTA format begins with a single-line description, followed by lines of sequence data;</para>
</listitem>
<listitem>
<para>the description line is distinguished from the sequence data by a greater-than (">") symbol in the first column;</para>
</listitem>
<listitem>
<para>aligned residues are in upper case, unaligned residues are in lower case.</para>
</listitem>
</itemizedlist>
<para>Since a final alignment contains each sequence only once, this option should be used only if input sequences do not contain repeats.</para>
</refsect2>
</refsect1>
<refsect1>
<title>OPTIONS</title>
<variablelist>
<varlistentry>
<term><option>-L</option> [min_length]</term>
<varlistentry>
<para>Set minimal alignment length equal to [min_length].</para>
<para>ProDA finds alignments of length greater than or equal to a threshold L. By default, L = 30. This option sets the threshold to [min_length].</para>
</varlistentry>
</varlistentry>
<varlistentry>
<term><option>-posterior</option></term>
<varlistentry>
<para>Use posterior decoding when computing local pairwise alignments.</para>
<para>ProDA computes local pairwise alignments between two sequences using a pair-HMM and either Viterbi decoding or posterior decoding. The default option is using Viterbi decoding which is faster than posterior decoding but may be less accurate. Turning on this option instructs the aligner to use posterior decoding instead. In the example above, the output was generated with -posterior option turned on.</para>
</varlistentry>
</varlistentry>
<varlistentry>
<term><option>-silent</option></term>
<varlistentry>
<para>Do not report progress while aligning.</para>
<para>Turning on this option instructs the aligner not to report the progress while aligning. By default, ProDA reports the progress on all pairwise alignments, block generation, and on block alignment. Since some stages of the algorithm, especially pairwise alignment, may take long time, reporting progress makes the program look alive while running.</para>
</varlistentry>
</varlistentry>
<varlistentry>
<term><option>-tran</option></term>
<varlistentry>
<para>Use transitivity when forming blocks of alignable sequence fragments.</para>
<para>Two sequence fragments are directly alignable if they are parts of a local pairwise alignment. By default, two fragments are considered alignable if and only if they are directly alignable. Turning on this option instructs the aligner to consider two fragments alignable when they are directly alignable or when both of them are directly alignable to a third fragment.</para>
</varlistentry>
</varlistentry>
<varlistentry>
<term><option>-fasta</option></term>
<varlistentry>
<para>Use FASTA output format in addition to the ClustalW format.</para>
<para>When this option is turned on, the aligner generates output in the FASTA format and stores in a file with the same name as the first input file and extension ".fasta", in addition to the normal output to stdout. This option should be used only if input sequences do not contain repeats.</para>
</varlistentry>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>SEE ALSO</title>
<para><command>probcons</command> (1)</para>
</refsect1>
<refsect1>
<title>AUTHOR</title>
<para>This manual page was written by &dhusername; &dhemail; for the &debian; system (but may be used by others). This man page is released under the same conditions as the software, that is Public Domain.</para>
<para>This software has been released in Public Domain by Phuong T.M., Do C.B., Edgar R.C. and Batzoglou S. in "Multiple alignment of protein sequences with repeats and rearrangements", Nucleic Acids Research 2006 - 34(20), 5932-5942</para>
</refsect1>
</refentry>
|