1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

RNAhybrid 2.1.2
RNAhybrid is a tool for finding the minimum free energy hybridisation
of a long (target) and a short (query) RNA. The hybridisation is
performed in a kind of domain mode, ie. the short sequence is
hybridised to the best fitting part of the long one. The tool is
primarily meant as a means for microRNA target prediction. In
addition to mfes, the program calculates pvalues based on extreme
value distributions of length normalised energies.
RNAcalibrate is a tool for calibrating minimum free energy (mfe)
hybridisations performed with RNAhybrid. It searches a random database
that can be given on the command line or otherwise generates random
sequences according to given sample size, length distribution
parameters and dinucleotide frequencies. To the empirical distribution
of length normalised minimum free energies, parameters of an extreme
value distribution (evd) are fitted. The resulting location and scale
parameters of the evd can then be given to RNAhybrid for the
calculation of mfe pvalues.
RNAeffective is a tool for determining the effective number of
orthologous miRNA targets. This number can be used for the
calculation of more accurate joint pvalues in multispecies
analyses. RNAeffective searches a set of target sequences with random
miRNAs that can be given on the command line or otherwise generates
random sequences according to given sample size, length distribution
parameters and dinucleotide frequencies. The empirical distribution of
joint pvalues is compared to the pvalues themselves, and the
effective number of independent targets is the one that reduces the
deviation between the two distributions.
For installation, see the file INSTALL.
After installation, try the following examples (make sure that RNAhybrid,
RNAcalibrate and RNAeffective are in your PATH by then):
RNAhybrid s 3utr_worm t examples/celhbl1.fasta q examples/cellet7.fasta
This searches the C. elegans hbl1 3'UTR with the C. elegans let7
miRNA. The option s tells RNAhybrid to quickly estimate statistical
parameters from "minimal duplex energies" under the assumption that
the target sequences are worm (C. elegans, to be precise) 3'UTR
sequences. You can also use 3utr_fly and 3utr_human.
To get a better estimate of statistical parameters, use RNAcalibrate:
RNAcalibrate d examples/3UTR_worm.freq k 50 l 50,30 q examples/cellet7.fasta
This generates 50 random sequences with lengths distributed according
to a normal (Gaussian) distribution with mean 50 and standard
deviation 30, following the dinucleotide distribution that is defined
in the file 3UTR_worm.freq in the examples dicrectory. The output are
the parameters of an extreme value distribution (location and
shape). Since with 50 random sequences the estimate is not very
accurate, you should use larger numbers of several thousand. Default
values for k and l are 5000 and 500,300, respectively, so you can
omit these options.
The estimated parameters can be used with RNAhybrid for accurate
pvalue calculation of length normalised minimum free energies:
RNAhybrid d 1.9,0.28 t examples/celhbl1.fasta q examples/cellet7.fasta
Here, 1.9 is the location parameter and 0.28 the shape parameter of
the assumed extreme value distribution.
If you want to force miRNA/target duplexes to have a helix in a specified
part, for example at the 5'end of the miRNA, use the f option:
RNAhybrid f 2,7 d 1.9,0.28 t examples/celhbl1.fasta q examples/cellet7.fasta
f 2,7 tells RNAhybrid to force the duplexes to have a helix (ie. an
uninterrupted stretch of base pairs, no bulges, no internal loops)
from nucleotide 2 to nucleotide 7 in the miRNA.
Since such a structural constraint affects the statistical
significance of matches, you should use RNAcalibrate with the same
constraint:
RNAcalibrate f 2,7 d examples/3UTR_worm.freq k 50 l 50,30 q examples/cellet7.fasta
Be aware that you might need a larger sample (larger k value) to get
a good estimate of statistical parameters, especially for shorter
sequences. This is, because it is not for all miRNA/target
combinations possible to form a helix at the specified positions.
To get a feeling of how stable the parameter estimates are, repeat the
calibration several times and have a look at the resulting values.
In a crossspecies analysis, you would search C. briggsae sequences as
well:
RNAhybrid s 3utr_worm t examples/cbrhbl1.fasta q examples/cellet7.fasta
To assess how much evidence the use of multiple species adds, you can
calculate the effective number of orthologous sequences:
RNAeffective k 30 s t examples/hbl1.fasta q examples/cellet7.fasta
Here, hbl1.fasta contains both hbl1 3'UTR sequences (C. elegans and
C. briggsae). The output tells you an effective number of 1.3, which
means that the two sequences are statistically rather dependent, and
that it is not that surprising to find a good hit in cbrhbl1 if you
have found one in celhbl1. The closer the effective number is to the
actual number (2 in this example), the statistically more independent
the sequences are. Like in the RNAcalibrate examples, the k option
should take larger values, although the calculations are very time
consuming.
In general, target files (t option) and query (miRNA) files (q
option) can be multiple fasta files. The searches are performed in all
combinations (all queries vs. all targets).
