File: RNA.pod

package info (click to toggle)
vienna-rna 2.4.17%2Bdfsg-2
  • links: PTS, VCS
  • area: non-free
  • in suites: bullseye
  • size: 64,480 kB
  • sloc: ansic: 158,212; cpp: 86,216; perl: 6,429; pascal: 5,265; python: 4,400; makefile: 2,211; sh: 623; fortran: 183; xml: 182
file content (726 lines) | stat: -rw-r--r-- 20,670 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
=head1 NAME

RNA - interface to the ViennaRNA C-library (libRNA.a)

=head1 SYNOPSIS

  use RNA;
  $seq = "CGCAGGGAUACCCGCG";
  ($struct, $mfe) = RNA::fold($seq);  #predict mfe structure of $seq
  RNA::PS_rna_plot($seq, $struct, "rna.ps");  # write PS plot to rna.ps
  $F = RNA::pf_fold($seq);   # compute partition function and pair pobabilities
  RNA::PS_dot_plot($seq, "dot.ps");          # write dot plot to dot.ps
  ...

=head1 DESCRIPTION

The RNA.pm package gives access to almost all functions in the libRNA.a
library of the Vienna RNA PACKAGE. The Perl wrapper is generated using
SWIG http://www.swig.org/ with relatively little manual intervention.
For each C function in the library the perl package provides a function
of the same name and calling convention (with few exceptions). For
detailed information you should therefore also consult the documentation
of the library (info RNAlib).

Note that in general C arrays are wrapped into opaque objects that can
only be accessed via helper functions. SWIG provides a couple of general
purpose helper functions, see the section at the end of this file. C
structures are wrapped into Perl objects using SWIG's shadow class
mechanism, resulting in a tied hash with keys named after the structure
members.

For the interrested reader we list for each scalar type of the
corepsonding C variable in brackets, and point out the header files
containing the C declaration.

=head2 Folding Routines

Minimum free Energy Folding (from fold.h)

=over 4

=item fold SEQUENCE

=item fold SEQUENCE, CONSTRAINTS

computes the minimum free energy structure of the string SEQUENCE and returns
the predicted structure and energy, e.g.

  ($structure, $mfe) = RNA::fold("UGUGUCGAUGUGCUAU");

If a second argument is supplied and
L<$fold_constrained|/$fold_constrained>==1 the CONSTRAINTS string is
used to specify constraints on the predicted structure.  The
characters '|', 'x', '<', '>' mark bases that are paired, unpaired,
paired upstream, or downstream, respectively; matching brackets "( )"
denote base pairs, dots '.' are used for unconstrained bases.

In the two argument version the CONSTRAINTS string is modified and holds the
predicted structure upon return. This is done for backwards compatibility only,
and might change in future versions.

=item energy_of_struct SEQUENCE, STRUCTURE

returns the energy of SEQUENCE on STRUCTURE (in kcal/mol). The string structure
must hold a valid secondary structure in bracket notation.

=item update_fold_params

recalculate the pair matrix and energy parameters after a change in folding
parameters. In many cases (such as changes to
L<$temperature|/$temperature>) the fold() routine will call
update_fold_params automatically when necessary.

=item free_arrays

frees memory allocated internally when calling L<fold|/fold>.


=item cofold SEQUENCE

=item cofold SEQUENCE, CONSTRAINTS

works as fold, but SEQUENCE may be the concatenation of two RNAs in order
compute their hybridization structure. E.g.:

  $seq1  ="CGCAGGGAUACCCGCG";
  $seq2  ="GCGCCCAUAGGGACGC";
  $RNA::cut_point = length($seq1)+1;
  ($costruct, $comfe) = RNA::cofold($seq1 . $seq2);

=item duplexfold SEQ1 SEQ2

compute the structure upon hybridization of SEQ1 and SEQ2. In contrast to
cofold only intra-molecular pairs are allowed. Thus, the algorithm runs in
O(n1*n2) time where n1 and n2 are the lengths of the sequences. The result
is returned in a C struct containing the innermost base pair (i,j) the
structure and energy. E.g:

  $seq1 ="CGCAGGGAUACCCGCG";
  $seq2 ="GCGCCCAUAGGGACGC";
  $dup  = RNA::duplexfold($seq1, $seq2);
  print "Region ", $dup->{i}+1-length($seq1), " to ",
        $dup->{i}, " of seq1 ",
        "pairs up with ", $dup->{j}, " to ",
        $dup->{j}+length($dup->{structure})-length($seq1)-2,
        " of seq2\n";

=back

Partition function Folding (from part_func.h)

=over 4

=item pf_fold SEQUENCE

=item pf_fold SEQUENCE, CONSTRAINTS

calculates the partition function over all possible secondary
structures and the matrix of pair probabilities for SEQUENCE and
returns a two element list consisting of a string summarizing possible
structures. See below on how to access the pair probability matrix. As
with L<fold|/fold> the second argument can be used to specify folding
constraints. Constraints are implemented by excluding base pairings
that contradict the constraint, but without bonus
energies. Constraints of type '|' (paired base) are ignored.  In the
two argument version CONSTRAINTS is modified to contain the structure
string on return (obsolete feature, for backwards compatibility only)

=item get_pr I, J

After calling C<pf_fold> the global C variable C<pr> points to the
computed pair probabilities. Perl access to the C is facilitated by
the C<get_pr> helper function that looks up and returns the
probability of the pair (I,J).

=item free_pf_arrays

frees memory allocated for pf_fold

=item update_pf_params LENGTH

recalculate energy parameters for pf_fold. In most cases (such as
simple changes to L<$temperature|/$temperature>) C<pf_fold>
will take appropriate action automatically.

=item pbacktrack SEQUENCE

return a random structure chosen according to it's Boltzmann probability.
Use to produce samples representing the thermodynamic ensemble of
structures.

  RNA::pf_fold($sequence);
  for (1..1000) {
     push @sample, RNA::pbacktrack($sequence);
  }

=item co_pf_fold SEQUENCE

=item co_pf_fold SEQUENCE, CONSTRAINTS

calculates the partition function over all possible secondary
structures and the matrix of pair probabilities for SEQUENCE.
SEQUENCE is a concatenation of two sequences (see cofold).
Returns a five element list consisting of a string summarizing possible
structures as first element. The second element is the Gibbs free energy of Sequence 1 (as computed also with pf_fold), the third element the Gibbs free energy of Sequence 2. The fourth element is the Gibbs free energy of all structures that have INTERmolecular base pairs, and finally the fifth element is the Gibbs free energy of the whole ensemble (dimers as well as monomers).
See above on how to access the pair probability matrix. As
with L<fold|/fold> the second argument can be used to specify folding
constraints. Constraints are implemented by excluding base pairings
that contradict the constraint, but without bonus
energies. Constraints of type '|' (paired base) are ignored.  In the
two argument version CONSTRAINTS is modified to contain the structure
string on return (obsolete feature, for backwards compatibility only)

=item free_co_pf_arrays

frees memory allocated for co_pf_fold

=item update_pf_co_params LENGTH

recalculate energy parameters for co_pf_fold. In most cases (such as
simple changes to L<$temperature|/$temperature>) C<co_pf_fold>
will take appropriate action automatically.

=item get_concentrations FdAB, FdAA, FdBB, FA, FB, CONCA, CONCB

calculates equilibrium concentrations of the three dimers AB, AA, and BB, as well as the two monomers A and B out of the free energies of the duplexes (FdAB, FdAA, FdBB, these are the fourth elements returned by co_pf_fold), the monomers (FA, FB (e.g. the second and third elements returned by co_pf_fold with sequences AB) and the start concentrations of A and B. It returns as first element the concentration of AB dimer, than AA and BB dimer, as fourth element the A monomer concentration, and as fifth and last element the B monomer concentration.
So, to compute concentrations, you first have to run 3 co_pf_folds (with sequences AB, AA and BB).

=back

Suboptimal Folding (from subopt.h)

=over 4

=item subopt SEQUENCE, CONSTRAINTS, DELTA

=item subopt SEQUENCE, CONSTRAINTS, DELTA, FILEHANDLE

compute all structures of SEQUENCE within DELTA*0.01 kcal/mol of the
optimum. If specified, results are written to FILEHANDLE and nothing
is returned. Else, the C function returnes a list of C structs of type
SOLUTION. The list is wrapped by SWIG as a perl object that can be
accesses as follows:

  $solution = subopt($seq, undef, 500);
  for (0..$solution->size()-1) {
     printf "%s %6.2f\n",  $solution->get($_)->{structure},
			   $solution->get($_)->{energy};
  }

=back

Alignment Folding (from alifold.h)

=over 4

=item alifold REF

=item fold REF, CONSTRAINTS

similar to fold() but compute the consensus structure for a set of aligned
sequences. E.g.:

  @align = ("GCCAUCCGAGGGAAAGGUU",
	    "GAUCGACAGCGUCU-AUCG",
	    "CCGUCUUUAUGAGUCCGGC");
  ($consens_struct, $consens_en) = RNA::alifold(\@align);

=item consensus REF
=item consens_mis REF

compute a simple consensus sequence or "most informative sequence" form an
alignment. The simple consensus returns the most frequent character for
each column, the MIS uses the IUPAC symbol that contains all characters
that are overrepresented in the column.

  $mis = consensus_mis(\@align);


=back

Inverse Folding (from inverse.h)

=over 4

=item inverse_fold START, TARGET

find a sequence that folds into structure TARGET, by optimizing the
sequence until its mfe structure (as returned by L<fold|/fold>) is
TARGET. Startpoint of the optimization is the sequence START. Returns
a list containing the sequence found and the final value of the cost
function, i.e. 0 if the search was successful. A random start sequence
can be generated using L<random_string|/random_string>.

=item inverse_pf_fold START, TARGET

optimizes a sequence (beginning with START) by maximising the
frequency of the structure TARGET in the thermodynamic ensemble
of structures. Returns a list containing the optimized sequence and
the final value of the cost function. The cost function is given by
C<energy_of_struct(seq, TARGET) - pf_fold(seq)>, i.e.C<-RT*log(p(TARGET))>

=item $final_cost [float]

holds the value of the cost function where the optimization in
C<inverse_pf_fold> should stop. For values <=0 the optimization will
only terminate at a local optimimum (which might take very long to reach).

=item $symbolset [char *]

the string symbolset holds the allowed characters to be used by
C<inverse_fold> and C<inverse_pf_fold>, the default alphabet is "AUGC"


=item $give_up [int]

If non-zero stop optimization when its clear that no exact solution
can be found. Else continue and eventually return an approximate
solution. Default 0.

=back

Cofolding of two RNA molecules (from cofold.h)

=over 4


=back

Global Variables to Modify Folding (from fold_vars.h)

=over 4

=item $noGU [int]

Do not allow GU pairs to form, default 0.

=item $no_closingGU [int]

allow GU only inside stacks, default 0.

=item $tetra_loop [int]

Fold with specially stable 4-loops, default 1.

=item $energy_set [int]

0 = BP; 1=any mit GC; 2=any mit AU-parameter, default 0.

=item $dangles [int]

How to compute dangling ends. 0: no dangling end energies, 1: "normal"
dangling ends (default), 2: simplified dangling ends, 3: "normal" +
co-axial stacking. Note that L<pf_fold|/pf_fold> treats cases 1 and 3
as 2. The same holds for the main computation in L<subopt|/subopt>,
however subopt will re-evalute energies using
L<energy_of_struct|energy_of_struct> for cases 1 and 3. See the more
detailed discussion in RNAlib.texinfo.

=item $nonstandards [char *]

contains allowed non standard bases, default empty string ""

=item $temperature [double]

temperature in degrees Celsius for rescaling parameters, default 37C.

=item $logML [int]

use logarithmic multiloop energy function in
L<energy_of_struct|/energy_of_struct>, default 0.

=item $noLonelyPairs [int]

consider only structures without isolated base pairs (helices of length 1).
For L<pf_fold|/pf_fold> only eliminates pairs
that can B<only> occur as isolated pairs. Default 0.

=item $base_pair [struct bond *]

list of base pairs from last call to L<fold|/fold>. Better use
the structure string returned by  L<fold|/fold>.

=item $pf_scale [double]

scaling factor used by L<pf_fold|/pf_fold> to avoid overflows. Should
be set to exp(-F/(RT*length)) where F is a guess for the ensmble free
energy (e.g. use the mfe).


=item $fold_constrained [int]

apply constraints in the folding algorithms, default 0.

=item $do_backtrack [int]

If 0 do not compute the pair probabilities in L<pf_fold|/pf_fold>
(only the partition function). Default 1.

=item $backtrack_type [char]

usually 'F'; 'C' require (1,N) to be bonded; 'M' backtrack as if the
sequence was part of a multi loop. Used by L<inverse_fold|/inverse_fold>

=item $pr [double *]

the base pairing prob. matrix computed by L<pf_fold|/pf_fold>.

=item $iindx [int *]

Array of indices for moving withing the C<pr> array. Better use
L<get_pr|/get_pr>.

=back

from move_set.h

=over 4

=item move_standard SEQUENCE, STRUCTURE, MOVE_TYPE, VERBOSITY, SHIFTS, noLP 

Walking method to find local minima. There are three different kinds available
which can be cosen using the MOVE_TYPE enum:
0 - GRADIENT: take the neighbouring structure with the lowest energy
1 - FIRST: take the first neighbour with a lower energy
2 - ADAPTIVE: randomly choose a neighbour
STRUCTURE is the start structure and will also be used to return the target structure.
Others are options set as integers.

=back

=head2 Parsing and Comparing Structures

from RNAstruct.h: these functions convert between strings
representating secondary structures with various levels of coarse
graining. See the documentation of the C library for details

=over 4

=item b2HIT STRUCTURE

Full -> HIT [incl. root]

=item b2C STRUCTURE

Full -> Coarse [incl. root]

=item b2Shapiro STRUCTURE

Full -> weighted Shapiro [i.r.]

=item add_root STRUCTURE

{Tree} -> ({Tree}R)

=item expand_Shapiro COARSE

add S for stacks to coarse struct

=item expand_Full STRUCTURE

Full -> FFull

=item unexpand_Full FSTRUCTURE

FFull -> Full

=item unweight WCOARSE

remove weights from coarse struct

=item unexpand_aligned_F ALIGN



=item parse_structure STRUCTURE

computes structure statistics, and fills the following global variables:

$loops    [int] number of loops (and stacks)
$unpaired [int] number of unpaired positions
$pairs    [int] number of paired positions
$loop_size[int *]  holds all loop sizes
$loop_degree[int *] holds all loop degrees
$helix_size[int *] holds all helix lengths

=back

from treedist.h: routines for computing tree-edit distances between structures

=over 4

=item make_tree XSTRUCT

convert a structure string as produced by the expand_... functions to a
Tree, useable as input to tree_edit_distance.

=item tree_edit_distance T1, T2

compare to structures using tree editing. C<T1>, C<T2> must have been
created using C<tree_edit_distance>

=item print_tree T

mainly for debugging

=item free_tree T

free space allocated by make_tree

=back

from stringdist.h routines to compute structure distances via string-editing

=over 4

=item Make_swString STRUCTURE

[ returns swString * ]
make input for string_edit_distance

=item string_edit_distance S1, S2

[ returns float  ]
compare to structures using string alignment. C<S1>, C<S2> should be
created using C<Make_swString>

=back

from profiledist

=over

=item Make_bp_profile LENGTH

[ returns (float *) ]
condense pair probability matrix C<pr> into a vector containing
probabilities for unpaired, upstream paired and downstream paired.
This resulting probability profile is used as input for
profile_edit_distance

=item profile_edit_distance T1, T2

[ returns float ]
align two probability profiles produced by C<Make_bp_profile>

=item print_bppm T

[ returns void ]
print string representation of probability profile

=item free_profile T

[ returns void ]
free space allocated in Make_bp_profile

=back

Global variables for computing structure distances

=over 4

=item $edit_backtrack [int]

set to 1 if you want backtracking

=item $aligned_line [(char *)[2]]

containes alignmed structures after computing structure distance with
C<edit_backtrack==1>

=item $cost_matrix [int]

0 usual costs (default), 1 Shapiro's costs

=back

=head2 Utilities (from utils.h)

=over 4

=item space SIZE

allocate memory from C. Usually not needed in Perl

=item nrerror MESSGAE

die with error message. Better use Perl's C<die>

=item $xsubi [unsigned short[3]]

libRNA uses the rand48 48bit random number generator if available, the
current random  number is always stored in $xsubi.

=item init_rand

initialize the $xsubi random number from current time

=item urn

returns a random number between 0 and 1 using the random number
generator from the RNA library.

=item int_urn FROM, TO

returns random integer in the range [FROM..TO]

=item time_stamp

current date in a string. In perl you might as well use C<locatime>

=item random_string LENGTH, SYMBOLS

returns a string of length LENGTH using characters from the string
SYMBOLS

=item hamming S1, S2

calculate hamming distance of the strings C<S1> and C<S2>.


=item pack_structure STRUCTURE

pack secondary structure, using a 5:1 compression via 3
encoding. Returns the packed string.

=item unpack_structure PACKED

unpacks a secondary structure packed with pack_structure

=item make_pair_table STRUCTURE

returns a pair table as a newly allocated (short *) C array, such
that: table[i]=j if (i.j) pair or 0 if i is unpaired, table[0]
contains the length of the structure.

=item bp_distance STRUCTURE1, STRUCTURE2

returns the base pair distance of the two STRUCTURES. dist = {number
of base pairs in one structure but not in the other} same as edit
distance with open-pair close-pair as move-set

=back

from PS_plot.h

=over 4

=item PS_rna_plot SEQUENCE, STRUCTURE, FILENAME

write PostScript drawing of structure to FILENAME. Returns 1 on
sucess, 0 else.

=item PS_rna_plot_a SEQUENCE, STRUCTURE, FILENAME, PRE, POST

write PostScript drawing of structure to FILENAME. The strings PRE and
POST contain PostScript code that is included verbatim in the plot just
before (after) the data.  Returns 1 on sucess, 0 else.

=item gmlRNA SEQUENCE, STRUCTURE, FILENAME, OPTION

write structure drawing in gml (Graph Meta Language) to
FILENAME. OPTION should be a single character. If uppercase the gml
output will include the SEQUENCE as node labels. IF OPTION equal 'x'
or 'X' write graph with coordinates (else only connectivity
information). Returns 1 on sucess, 0 else.

=item ssv_rna_plot SEQUENCE, STRUCTURE, SSFILE

write structure drfawing as coord file for SStructView Returns 1 on
sucess, 0 else.

=item xrna_plot SEQUENCE, STRUCTURE, SSFILE

write structure drawing as ".ss" file for further editing in XRNA.
Returns 1 on sucess, 0 else.

=item PS_dot_plot SEQUENCE, FILENAME

write a PostScript dot plot of the pair probability matix to
FILENAME. Returns 1 on sucess, 0 else.

=item $rna_plot_type [int]

Select layout algorithm for structure drawings. Currently available
0= simple coordinates, 1= naview, default 1.

=back

from read_epars.c

=over 4

=item read_parameter_file FILENAME

read energy parameters from FILENAME

=item write_parameter_file FILENAME

write energy parameters to FILENAME

=back

=head2 SWIG helper functions

The package includes generic helper functions to access C arrays
of type C<int>, C<float> and C<double>, such as:

=over 4

=item intP_getitem POINTER, INDEX

return the element INDEX from the array

=item intP_setitem POINTER, INDEX, VALUE

set element INDEX to VALUE

=item new_intP NELEM

allocate a new C array of integers with NELEM elements and return the pointer

=item delete_intP POINTER

deletes the C array by calling free()

=back

substituting C<intP> with C<floatP>, C<doubleP>, C<ushortP>,
C<shortP>, gives the corresponding functions for arrays of float or
double, unsigned short, and short. You need to know the correct C
type however, and the functions work only for arrays of simple types.
Note, that the shortP... functions were used for unsigned short in previous
versions, while starting with v1.8.3 it can only access signed short arrays.

On the lowest level the C<cdata> function gives direct access to any data
in the form of a Perl string.

=over

=item cdata POINTER, SIZE

copies SIZE bytes at POINTER to a Perl string (with binary data)

=item memmove POINTER, STRING

copies the (binary) string STRING to the memory location pointed to by
POINTER.
Note: memmove is broken in current swig versions (e.g. 1.3.31)

=back

In combination with Perl's C<unpack> this provides a generic way to convert
C data structures to Perl. E.g.

  RNA::parse_structure($structure);  # fills the $RNA::loop_degree array
  @ldegrees = unpack "I*", RNA::cdata($RNA::loop_degree, ($RNA::loops+1)*4);

Warning: using these functions with wrong arguments will corrupt your
memory and lead to a segmentation fault.

=head1 AUTHOR

Ivo L. Hofacker <ivo@tbi.univie.ac.at>

=cut