1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
|
=pod
=head1 NAME
score_conservation - score protein sequence conservation
=head1 SYNOPSIS
score_conservation [options] ALIGNFILE
=head1 DESCRIPTION
Score protein sequence conservation in B<ALIGNFILE>. B<ALIGNFILE> must be in FASTA, CLUSTAL or Stockholm format.
The following conservation scoring methods are implemented:
* sum of pairs
* weighted sum of pairs
* Shannon entropy
* Shannon entropy with property groupings (Mirny and Shakhnovich 1995,
Valdar and Thornton 2001)
* relative entropy with property groupings (Williamson 1995)
* von Neumann entropy (Caffrey et al 2004)
* relative entropy (Samudrala and Wang 2006)
* Jensen-Shannon divergence (Capra and Singh 2007)
A window-based extension that incorporates the estimated conservation of
sequentially adjacent residues into the score for each column is also given.
This window approach can be applied to any of the conservation scoring
methods.
With default parameters score_conservation(1) computes the conservation scores for the alignment using the
Jensen-Shannon divergence and a window B<-w> of I<3>.
The sequence-specific output can be used as the conservation input for
concavity(1).
Conservation is highly predictive in identifying catalytic sites and
residues near bound ligands.
=head1 REFERENCES
=over
=item Capra JA and Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics, 23(15):1875-82, 2007.
=back
=head1 OPTIONS
=over
=item -a [NAME]
Reference sequence. Print scores in reference to the named sequence (ignoring gaps). Default prints the entire column.
=item -b [0-1]
Lambda for window heuristic linear combination. Default=I<.5>.
Equation:
C<score = (1 - lambda) * average_score_over_window_around_middle + lambda * score_of_middle>
=item -d [FILE]
Background distribution file, e.g. F<distributions/swissprot.distribution>. Default=built-in BLOSUM62.
=item -g [0-1)]
Gap cutoff. Do not score columns that contain more than gap cutoff fraction gaps. Default=I<.3>.
=item -h
Print help.
=item -l [true|false]
Use sequence weighting. Default=I<true>.
=item -m [FILE]
Similarity matrix file, e.g. F<matrix/blosum62.bla> or .qij. Default=F<matrix/blosum62.bla>.
Some methods, e.g. I<js_divergence>, do not use this.
=item -n [true|false]
Normalize scores. Print the z-score (over the alignment) of each column raw score. Default=I<false>.
=item -o FILE
Output file. Default: standard output stream.
=item -p [true|false]
Use gap penalty. Lower the score of columns that contain gaps, proportionally to the sum weight of the gapped sequences. Default=I<true>.
=item -s [METHOD]
Conservation estimation method, one of I<shannon_entropy property_entropy property_relative_entropy vn_entropy relative_entropy js_divergence sum_of_pairs>. Default=I<js_divergence>.
=item -w [0-INT]
Window size. Number of residues on either side included in the window. Default=I<3>.
=back
=head1 EXAMPLES
Note: you may have to copy and uncompress the example data files before running the following examples.
=over
=item Compute conservation scores for the alignment using the Jensen-Shannon divergence with default settings and print out the scores:
score_conservation __docdir__/examples/2plc__hssp-filtered.aln
=item Score an alignment using Jensen-Shannon divergence, a window of size 3 (on either side of the residue), and the swissprot background distribution:
score_conservation -s js_divergence -w 3 -d \
__pkgdatadir__/distributions/swissprot.distribution \
__docdir__/examples/2plc__hssp-filtered.aln
=back
=head1 FILES
=over
=item Distributions
F<__pkgdatadir__/distributions>
=item Matrices
F<__pkgdatadir__/matrix>
=back
=head1 SEE ALSO
=over
=item Homepage L<http://compbio.cs.princeton.edu/conservation/>
=item Publication L<http://bioinformatics.oxfordjournals.org/cgi/content/full/23/15/1875>
=item concavity(1)
=back
=cut
|