1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112
|
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Restriction</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
</head>
<body>
<h1 id="working-with-restriction-enzymes">Working with restriction enzymes</h1>
<h2 id="table-of-contents">Table of contents</h2>
<ol class="incremental" type="1">
<li><a href="#1">The restriction enzymes classes</a>
<ol class="incremental" type="1">
<li><a href="#1.1">Importing the enzymes</a></li>
<li><a href="#1.2">Naming convention</a></li>
<li><a href="#1.3">Searching for restriction sites</a></li>
<li><a href="#1.4">Retrieving the sequences produced by a digestion</a></li>
<li><a href="#1.5">Analysing circular sequences</a></li>
<li><a href="#1.6">Comparing enzymes with each others</a></li>
<li><a href="#1.7">Other facilities provided by the enzyme classes</a></li>
</ol></li>
<li><a href="#2">The RestrictionBatch class: a class to deal with several enzymes</a>
<ol class="incremental" type="1">
<li><a href="#2.1">Creating a RestrictionBatch</a></li>
<li><a href="#2.2">Restricting a RestrictionBatch to a particular supplier</a></li>
<li><a href="#2.3">Adding enzymes to a RestrictionBatch</a></li>
<li><a href="#2.4">Removing enzymes from a RestrictionBatch</a></li>
<li><a href="#2.5">Manipulating RestrictionBatch</a></li>
<li><a href="#2.6">Analysing sequences with a RestrictionBatch</a></li>
<li><a href="#2.7">Other RestrictionBatch methods</a></li>
</ol></li>
<li><a href="#3">AllEnzymes and CommOnly: two preconfigured RestrictionBatches</a></li>
<li><a href="#4">The Analysis class: even simpler restriction analysis</a>
<ol class="incremental" type="1">
<li><a href="#4.1">Setting up an Analysis</a></li>
<li><a href="#4.2">Full restriction analysis</a></li>
<li><a href="#4.3">Changing the title</a></li>
<li><a href="#4.4">Customising the output</a></li>
<li><a href="#4.5">Fancier restriction analysis</a></li>
<li><a href="#4.6">More complex analysis</a></li>
</ol></li>
<li><a href="#5">Advanced features: the FormattedSeq class</a>
<ol class="incremental" type="1">
<li><a href="#5.1">Creating a FormattedSeq</a></li>
<li><a href="#5.2">Unlike Bio.Seq, FormattedSeq retains information about their shape</a></li>
<li><a href="#5.3">Changing the shape of a FormattedSeq</a></li>
<li><a href="#5.4">Using / and // operators with FormattedSeq</a></li>
</ol></li>
<li><a href="#6">More advanced features</a>
<ol class="incremental" type="1">
<li><a href="#6.1">Updating the enzymes from Rebase</a>
<ol class="incremental" type="1">
<li><a href="#6.1.1">Fetching the recent enzyme files manually from Rebase</a></li>
<li><a href="#6.1.2">Fetching the recent enzyme files with rebase_update.py</a></li>
<li><a href="#6.1.3">Compiling a new dictionary with ranacompiler.py</a></li>
</ol></li>
<li><a href="#6.2">Subclassing the class Analysis</a></li>
</ol></li>
<li><a href="#7">Limitation and caveat</a>
<ol class="incremental" type="1">
<li><a href="#7.1">All DNA are non methylated</a></li>
<li><a href="#7.2">No support for star activity</a></li>
<li><a href="#7.3">Safe to use with degenerated DNA</a></li>
<li><a href="#7.4">Non standard bases in DNA are not allowed</a></li>
<li><a href="#7.5">Sites found at the edge of linear DNA might not be accessible in a real digestion</a></li>
<li><a href="#7.6">Restriction reports cutting sites not enzyme recognition sites</a></li>
</ol></li>
<li><a href="#8">Annexe: modifying dir() to use with from Bio.Restriction import *</a></li>
</ol>
<h3 id="the-restriction-enzymes-classes"><a name="1"></a>1. The restriction enzymes classes</h3>
<p>The restriction enzyme package is situated in <code>Bio.Restriction</code>. This package will allow you to work with restriction enzymes and realise restriction analysis on your sequence. Restriction make use of the facilities offered by <strong>REBASE</strong> and contains classes for more than 800 restriction enzymes. This chapter will lead you through a quick overview of the facilities offered by the <code>Restriction</code> package of Biopython. The chapter is constructed as an interactive Python session and the best way to read it is with a Python shell open alongside you.</p>
<h4 id="importing-the-enzymes"><a name="1.1"></a> 1.1 Importing the enzymes</h4>
<p>To import the enzymes, open a Python shell and type:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="op">>>></span> <span class="im">from</span> Bio <span class="im">import</span> Restriction</span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="op">>>></span> <span class="bu">dir</span>()</span>
<span id="cb1-3"><a href="#cb1-3"></a>[<span class="st">'Restriction'</span>, <span class="st">'__annotations__'</span>, <span class="st">'__builtins__'</span>, <span class="st">'__doc__'</span>, <span class="st">'__loader__'</span>, <span class="st">'__name__'</span>, <span class="st">'__package__'</span>, <span class="st">'__spec__'</span>]</span>
<span id="cb1-4"><a href="#cb1-4"></a><span class="op">>>></span> Restriction.EcoRI</span>
<span id="cb1-5"><a href="#cb1-5"></a>EcoRI</span>
<span id="cb1-6"><a href="#cb1-6"></a><span class="op">>>></span> Restriction.EcoRI.site</span>
<span id="cb1-7"><a href="#cb1-7"></a><span class="co">'GAATTC'</span></span>
<span id="cb1-8"><a href="#cb1-8"></a><span class="op">>>></span></span></code></pre></div>
<p>You will certainly notice that the package is quite slow to load. This is normal as each enzyme possess its own class and there is a lot of them. This will not affect the speed of Python after the initial import.</p>
<p>I don’t know for you but I find it quite cumbersome to have to prefix each operation with <code>Restriction.</code>, so here is another way to import the package.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="op">>>></span> <span class="im">from</span> Bio.Restriction <span class="im">import</span> <span class="op">*</span></span>
<span id="cb2-2"><a href="#cb2-2"></a><span class="op">>>></span> EcoRI</span>
<span id="cb2-3"><a href="#cb2-3"></a>EcoRI</span>
<span id="cb2-4"><a href="#cb2-4"></a><span class="op">>>></span> EcoRI.site</span>
<span id="cb2-5"><a href="#cb2-5"></a><span class="co">'GAATTC'</span></span>
<span id="cb2-6"><a href="#cb2-6"></a><span class="op">>>></span></span></code></pre></div>
<p>However, this method has one big disadvantage: It is almost impossible to use the command <code>dir()</code> anymore as there is so much enzymes the results is hardly readable. A workaround is provided at the end of this tutorial. I let you decide which method you prefer. But in this tutorial I will use the second. If you prefer the first method you will need to prefix each call to a restriction enzyme with <code>Restriction.</code> in the remaining of the tutorial.</p>
<h4 id="naming-convention"><a name="1.2"></a>1.2 Naming convention</h4>
<p>To access an enzyme simply enter its name. You must respect the usual naming convention with the upper case letters and Latin numbering (in upper case as well):</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1"></a><span class="op">>>></span> EcoRI</span>
<span id="cb3-2"><a href="#cb3-2"></a>EcoRI</span>
<span id="cb3-3"><a href="#cb3-3"></a><span class="op">>>></span> ecori</span>
<span id="cb3-4"><a href="#cb3-4"></a></span>
<span id="cb3-5"><a href="#cb3-5"></a>Traceback (most recent call last):</span>
<span id="cb3-6"><a href="#cb3-6"></a> File <span class="st">"<pyshell#25>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op">-</span>toplevel<span class="op">-</span></span>
<span id="cb3-7"><a href="#cb3-7"></a> ecori</span>
<span id="cb3-8"><a href="#cb3-8"></a><span class="pp">NameError</span>: name <span class="st">'ecori'</span> <span class="kw">is</span> <span class="kw">not</span> defined</span>
<span id="cb3-9"><a href="#cb3-9"></a><span class="op">>>></span> EcoR1</span>
<span id="cb3-10"><a href="#cb3-10"></a></span>
<span id="cb3-11"><a href="#cb3-11"></a>Traceback (most recent call last):</span>
<span id="cb3-12"><a href="#cb3-12"></a> File <span class="st">"<pyshell#26>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op">-</span>toplevel<span class="op">-</span></span>
<span id="cb3-13"><a href="#cb3-13"></a> EcoR1</span>
<span id="cb3-14"><a href="#cb3-14"></a><span class="pp">NameError</span>: name <span class="st">'EcoR1'</span> <span class="kw">is</span> <span class="kw">not</span> defined</span>
<span id="cb3-15"><a href="#cb3-15"></a><span class="op">>>></span> KpnI</span>
<span id="cb3-16"><a href="#cb3-16"></a>KpnI</span>
<span id="cb3-17"><a href="#cb3-17"></a><span class="op">>>></span></span></code></pre></div>
<p><code>ecori</code> or <code>EcoR1</code> are not enzymes, <code>EcoRI</code> and <code>KpnI</code> are.</p>
<h4 id="searching-for-restriction-sites"><a name="1.3"></a>1.3 Searching for restriction sites</h4>
<p>So what can we do with these restriction enzymes? To see that we will need a DNA sequence. Restriction enzymes support both <code>Bio.Seq.MutableSeq</code>and <code>Bio.Seq.Seq</code> objects. Your sequence must comply with the IUPAC alphabet. That means using A, C, G and T or U, plus N for any base, and various other standard codes like S for C or G, and V for A, C or G.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="op">>>></span> <span class="im">from</span> Bio.Seq <span class="im">import</span> Seq</span>
<span id="cb4-2"><a href="#cb4-2"></a><span class="op">>>></span> my_seq <span class="op">=</span> Seq(<span class="st">'AAAAAAAAAAAAAA'</span>)</span></code></pre></div>
<p>Searching a sequence for the presence of restriction site for your preferred enzyme is as simple as:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1"></a><span class="op">>>></span> EcoRI.search(my_seq)</span>
<span id="cb5-2"><a href="#cb5-2"></a>[]</span></code></pre></div>
<p>The results is a list. Here the list is empty since there is obviously no EcoRI site in <em>my_seq</em>. Let’s try to get a sequence with an EcoRI site.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1"></a><span class="op">>>></span> ecoseq <span class="op">=</span> my_seq <span class="op">+</span> Seq(EcoRI.site) <span class="op">+</span> my_seq</span>
<span id="cb6-2"><a href="#cb6-2"></a><span class="op">>>></span> ecoseq</span>
<span id="cb6-3"><a href="#cb6-3"></a>Seq(<span class="st">'AAAAAAAAAAAAAAGAATTCAAAAAAAAAAAAAA'</span>)</span>
<span id="cb6-4"><a href="#cb6-4"></a><span class="op">>>></span> EcoRI.search(ecoseq)</span>
<span id="cb6-5"><a href="#cb6-5"></a>[<span class="dv">16</span>]</span></code></pre></div>
<p>We therefore have a site at position 16 of the sequence <em>ecoseq</em>. The position returned by the method search is the first base of the downstream segment produced by a restriction (i.e. the first base after the position where the enzyme will cut). The <code>Restriction</code> package follows biological convention (the first base of a sequence is base 1). No need to make difficult conversions between your recorded biological data and the results produced by the enzymes in this package.</p>
<h4 id="retrieving-the-sequences-produced-by-a-digestion"><a name="1.4"></a>1.4 Retrieving the sequences produced by a digestion</h4>
<p><code>Seq</code> objects as all Python sequences, have different conventions and the first base of a sequence is base 0. Therefore to get the sequences produced by an EcoRI digestion of <em>ecoseq</em>, one should do the following:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1"></a><span class="op">>>></span> ecoseq[:<span class="dv">15</span>], ecoseq[<span class="dv">15</span>:]</span>
<span id="cb7-2"><a href="#cb7-2"></a>(Seq(<span class="st">'AAAAAAAAAAAAAAG'</span>), Seq(<span class="st">'AATTCAAAAAAAAAAAAAA'</span>))</span></code></pre></div>
<p>I hear you thinking “this is a cumbersome and error prone method to get these sequences”. To simplify your life, <code>Restriction</code> provides another method to get these sequences without hassle: <code>catalyse</code>. This method will return a tuple containing all the fragments produced by a complete digestion of the sequence. Using it is as simple as before:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1"></a><span class="op">>>></span> EcoRI.catalyse(ecoseq)</span>
<span id="cb8-2"><a href="#cb8-2"></a>(Seq(<span class="st">'AAAAAAAAAAAAAAG'</span>), Seq(<span class="st">'AATTCAAAAAAAAAAAAAA'</span>))</span></code></pre></div>
<p>BTW, you can also use spell it the American way <code>catalyze</code>:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1"></a><span class="op">>>></span> EcoRI.catalyze(ecoseq)</span>
<span id="cb9-2"><a href="#cb9-2"></a>(Seq(<span class="st">'AAAAAAAAAAAAAAG'</span>), Seq(<span class="st">'AATTCAAAAAAAAAAAAAA'</span>))</span></code></pre></div>
<h4 id="analysing-circular-sequences"><a name="1.5"></a>1.5 Analysing circular sequences</h4>
<p>Now, if you have entered the previous command in your shell you may have noticed that both <code>search</code> and <code>catalyse</code> can take a second argument <code>linear</code> which defaults to <code>True</code>. Using this will allow you to simulate circular sequences such as plasmids. Setting <code>linear</code> to <code>False</code> informs the enzyme to make the search over a circular sequence and to search for potential sites spanning over the boundaries of the sequence.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1"></a><span class="op">>>></span> EcoRI.search(ecoseq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb10-2"><a href="#cb10-2"></a>[<span class="dv">16</span>]</span>
<span id="cb10-3"><a href="#cb10-3"></a><span class="op">>>></span> EcoRI.catalyse(ecoseq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb10-4"><a href="#cb10-4"></a>(Seq(<span class="st">'AATTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAG'</span>),)</span>
<span id="cb10-5"><a href="#cb10-5"></a><span class="op">>>></span> ecoseq <span class="co"># for memory</span></span>
<span id="cb10-6"><a href="#cb10-6"></a>Seq(<span class="st">'AAAAAAAAAAAAAAGAATTCAAAAAAAAAAAAAA'</span>)</span></code></pre></div>
<p>OK, this is quite a difference, we only get one fragment, which correspond to the linearised sequence. The beginning sequence has been shifted to take this fact into account. Moreover we can see another difference:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1"></a><span class="op">>>></span> new_seq <span class="op">=</span> Seq(<span class="st">'TTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAA'</span>)</span>
<span id="cb11-2"><a href="#cb11-2"></a><span class="op">>>></span> EcoRI.search(new_seq)</span>
<span id="cb11-3"><a href="#cb11-3"></a>[]</span>
<span id="cb11-4"><a href="#cb11-4"></a><span class="op">>>></span> EcoRI.search(new_seq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb11-5"><a href="#cb11-5"></a>[<span class="dv">33</span>]</span></code></pre></div>
<p>As you can see using <code>linear=False</code>, make a site appearing in the sequence <em>new_seq</em>. This site does not exist in a linear sequence as the EcoRI site is split into two halves at the start and the end of the sequence. In a circular sequence however, the site is effectively present when the beginning and end of the sequence are joined.</p>
<h4 id="comparing-enzymes-with-each-others"><a name="1.6"></a>1.6 Comparing enzymes with each others</h4>
<p><code>Restriction</code> enzymes define 4 comparative operators <code>==</code>, <code>!=</code>, <code>>></code> and <code>%</code>. All these operator compares two enzymes together and either return <code>True</code> or <code>False</code>.</p>
<dl class="incremental">
<dt><code>==</code> (test identity)</dt>
<dd>It will return <code>True</code> if the two sides of the operator are the same. *Same" is defined as: same name, same site, same overhang (i.e. the only thing which is equal to <code>EcoRI</code> is <code>EcoRI</code>).
</dd>
<dt><code>!=</code> (test for different site or cutting)</dt>
<dd>It will return <code>True</code> if the two sides of the operator are different. Two enzymes are not different if the result produced by one enzyme will always be the same as the result produced by the other (i.e. true isoschizomers will not being the same enzymes, are not different since they are interchangeable).
</dd>
<dt><code>>></code> (test for neoschizomer)</dt>
<dd><code>True</code> if the enzymes recognise the same site, but cut it in a different way (i.e. the enzymes are neoschizomers).
</dd>
<dt><code>%</code> (test compatibilty)</dt>
<dd>Test the compatibility of the ending produced by the enzymes (will be <code>True</code> if the fragments produced with one of the enzyme can directly be ligated to fragments produced by the other).
</dd>
</dl>
<p>Let’s use <code>Acc65I</code> and its isoschizomers as example:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1"></a><span class="op">>>></span> Acc65I.isoschizomers()</span>
<span id="cb12-2"><a href="#cb12-2"></a>[Asp718I, KpnI]</span>
<span id="cb12-3"><a href="#cb12-3"></a><span class="op">>>></span> Acc65I.elucidate()</span>
<span id="cb12-4"><a href="#cb12-4"></a><span class="co">'G^GTAC_C'</span></span>
<span id="cb12-5"><a href="#cb12-5"></a><span class="op">>>></span> Asp718I.elucidate()</span>
<span id="cb12-6"><a href="#cb12-6"></a><span class="co">'G^GTAC_C'</span></span>
<span id="cb12-7"><a href="#cb12-7"></a><span class="op">>>></span> KpnI.elucidate()</span>
<span id="cb12-8"><a href="#cb12-8"></a><span class="co">'G_GTAC^C'</span></span>
<span id="cb12-9"><a href="#cb12-9"></a><span class="op">>>></span> <span class="co"># Asp718I and Acc65I are true isoschizomers,</span></span>
<span id="cb12-10"><a href="#cb12-10"></a><span class="op">>>></span> <span class="co"># they recognise the same site and cut it the</span></span>
<span id="cb12-11"><a href="#cb12-11"></a><span class="op">>>></span> <span class="co"># same way.</span></span>
<span id="cb12-12"><a href="#cb12-12"></a><span class="op">>>></span> <span class="co"># KpnI is a neoschizomers of the 2 others.</span></span>
<span id="cb12-13"><a href="#cb12-13"></a><span class="op">>>></span> <span class="co"># Here are the results of the 4 operators</span></span>
<span id="cb12-14"><a href="#cb12-14"></a><span class="op">>>></span> <span class="co"># for each pair of enzymes:</span></span>
<span id="cb12-15"><a href="#cb12-15"></a><span class="op">>>></span></span>
<span id="cb12-16"><a href="#cb12-16"></a><span class="op">>>></span> <span class="co">############# x == y (x is y)</span></span>
<span id="cb12-17"><a href="#cb12-17"></a><span class="op">>>></span> Acc65I <span class="op">==</span> Acc65I <span class="co"># same enzyme => True</span></span>
<span id="cb12-18"><a href="#cb12-18"></a><span class="va">True</span></span>
<span id="cb12-19"><a href="#cb12-19"></a><span class="op">>>></span> Acc65I <span class="op">==</span> KpnI <span class="co"># all other cases => False</span></span>
<span id="cb12-20"><a href="#cb12-20"></a><span class="va">False</span></span>
<span id="cb12-21"><a href="#cb12-21"></a><span class="op">>>></span> Acc65I <span class="op">==</span> Asp718I</span>
<span id="cb12-22"><a href="#cb12-22"></a><span class="va">False</span></span>
<span id="cb12-23"><a href="#cb12-23"></a><span class="op">>>></span> Acc65I <span class="op">==</span> EcoRI</span>
<span id="cb12-24"><a href="#cb12-24"></a><span class="va">False</span></span>
<span id="cb12-25"><a href="#cb12-25"></a><span class="op">>>></span> <span class="co">############ x != y (x and y are not true isoschizomers)</span></span>
<span id="cb12-26"><a href="#cb12-26"></a><span class="op">>>></span> Acc65I <span class="op">!=</span> Acc65I <span class="co"># same enzyme => False</span></span>
<span id="cb12-27"><a href="#cb12-27"></a><span class="va">False</span></span>
<span id="cb12-28"><a href="#cb12-28"></a><span class="op">>>></span> Acc65I <span class="op">!=</span> Asp718I <span class="co"># different enzymes, but cut same manner => False</span></span>
<span id="cb12-29"><a href="#cb12-29"></a><span class="va">False</span></span>
<span id="cb12-30"><a href="#cb12-30"></a><span class="op">>>></span> Acc65I <span class="op">!=</span> KpnI <span class="co"># all other cases => True</span></span>
<span id="cb12-31"><a href="#cb12-31"></a><span class="va">True</span></span>
<span id="cb12-32"><a href="#cb12-32"></a><span class="op">>>></span> Acc65I <span class="op">!=</span> EcoRI</span>
<span id="cb12-33"><a href="#cb12-33"></a><span class="va">True</span></span>
<span id="cb12-34"><a href="#cb12-34"></a><span class="op">>>></span> <span class="co">########### x >> y (x is neoschizomer of y)</span></span>
<span id="cb12-35"><a href="#cb12-35"></a><span class="op">>>></span> Acc65I <span class="op">>></span> Acc65I <span class="co"># same enzyme => False</span></span>
<span id="cb12-36"><a href="#cb12-36"></a><span class="va">False</span></span>
<span id="cb12-37"><a href="#cb12-37"></a><span class="op">>>></span> Acc65I <span class="op">>></span> Asp718I <span class="co"># same site, same cut => False</span></span>
<span id="cb12-38"><a href="#cb12-38"></a><span class="va">False</span></span>
<span id="cb12-39"><a href="#cb12-39"></a><span class="op">>>></span> Acc65I <span class="op">>></span> EcoRI <span class="co"># different site => False</span></span>
<span id="cb12-40"><a href="#cb12-40"></a><span class="va">False</span></span>
<span id="cb12-41"><a href="#cb12-41"></a><span class="op">>>></span> Acc65I <span class="op">>></span> KpnI <span class="co"># same site, different cut => True</span></span>
<span id="cb12-42"><a href="#cb12-42"></a><span class="va">True</span></span>
<span id="cb12-43"><a href="#cb12-43"></a><span class="op">>>></span> <span class="co">########### x % y (fragments produced by x and fragments produced by y</span></span>
<span id="cb12-44"><a href="#cb12-44"></a><span class="op">>>></span> <span class="co"># can be directly ligated to each other)</span></span>
<span id="cb12-45"><a href="#cb12-45"></a><span class="op">>>></span> Acc65I <span class="op">%</span> Asp718I</span>
<span id="cb12-46"><a href="#cb12-46"></a><span class="va">True</span></span>
<span id="cb12-47"><a href="#cb12-47"></a><span class="op">>>></span> Acc65I <span class="op">%</span> Acc65I</span>
<span id="cb12-48"><a href="#cb12-48"></a><span class="va">True</span></span>
<span id="cb12-49"><a href="#cb12-49"></a><span class="op">>>></span> Acc65I <span class="op">%</span> KpnI <span class="co"># KpnI -> '3 overhang, Acc65I-> 5' overhang => False</span></span>
<span id="cb12-50"><a href="#cb12-50"></a><span class="va">False</span></span>
<span id="cb12-51"><a href="#cb12-51"></a><span class="op">>>></span></span>
<span id="cb12-52"><a href="#cb12-52"></a><span class="op">>>></span> SunI.elucidate()</span>
<span id="cb12-53"><a href="#cb12-53"></a><span class="co">'C^GTAC_G'</span></span>
<span id="cb12-54"><a href="#cb12-54"></a><span class="op">>>></span> SunI <span class="op">==</span> Acc65I</span>
<span id="cb12-55"><a href="#cb12-55"></a><span class="va">False</span></span>
<span id="cb12-56"><a href="#cb12-56"></a><span class="op">>>></span> SunI <span class="op">!=</span> Acc65I</span>
<span id="cb12-57"><a href="#cb12-57"></a><span class="va">True</span></span>
<span id="cb12-58"><a href="#cb12-58"></a><span class="op">>>></span> SunI <span class="op">>></span> Acc65I</span>
<span id="cb12-59"><a href="#cb12-59"></a><span class="va">False</span></span>
<span id="cb12-60"><a href="#cb12-60"></a><span class="op">>>></span> SunI <span class="op">%</span> Acc65I <span class="co"># different site, same overhang (5' GTAC) => True</span></span>
<span id="cb12-61"><a href="#cb12-61"></a><span class="va">True</span></span>
<span id="cb12-62"><a href="#cb12-62"></a><span class="op">>>></span> SmaI <span class="op">%</span> EcoRV <span class="co"># 2 Blunt enzymes, all blunt enzymes are compatible => True</span></span>
<span id="cb12-63"><a href="#cb12-63"></a><span class="va">True</span></span></code></pre></div>
<h4 id="other-facilities-provided-by-the-enzyme-classes"><a name="1.7"></a>1.7 Other facilities provided by the enzyme classes</h4>
<p>The <code>Restriction</code> class provides quite a number of others methods. We will not go through all of them, but only have a quick look to the most useful ones.</p>
<p>Not all enzymes possess the same properties when it comes to the way they digest a DNA. If you want to know more about the way a particular enzyme cut you can use the three following methods. They are fairly straightforward to understand and refer to the ends that the enzyme produces: blunt, 5’ overhanging (also called 3’ recessed) sticky end and 3’ overhanging (or 5’ recessed) sticky end.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1"></a><span class="op">>>></span> EcoRI.is_blunt()</span>
<span id="cb13-2"><a href="#cb13-2"></a><span class="va">False</span></span>
<span id="cb13-3"><a href="#cb13-3"></a><span class="op">>>></span> EcoRI.is_5overhang()</span>
<span id="cb13-4"><a href="#cb13-4"></a><span class="va">True</span></span>
<span id="cb13-5"><a href="#cb13-5"></a><span class="op">>>></span> EcoRI.is_3overhang()</span>
<span id="cb13-6"><a href="#cb13-6"></a><span class="va">False</span></span></code></pre></div>
<p>A more detailled view of the restriction site can be produced using the <code>elucidate()</code> method. The <code>^</code> refers to the position of the cut in the sense strand of the sequence, <code>_</code> to the cut on the antisense or complementary strand. <code>^_</code> means blunt.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1"></a><span class="op">>>></span> EcoRI.elucidate()</span>
<span id="cb14-2"><a href="#cb14-2"></a><span class="co">'G^AATT_C'</span></span>
<span id="cb14-3"><a href="#cb14-3"></a><span class="op">>>></span> KpnI.elucidate()</span>
<span id="cb14-4"><a href="#cb14-4"></a><span class="co">'G_GTAC^C'</span></span>
<span id="cb14-5"><a href="#cb14-5"></a><span class="op">>>></span> EcoRV.elucidate()</span>
<span id="cb14-6"><a href="#cb14-6"></a><span class="co">'GAT^_ATC'</span></span></code></pre></div>
<p>The method <code>frequency()</code> will give you the statistical frequency of the enzyme site.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1"></a><span class="op">>>></span> EcoRI.frequency()</span>
<span id="cb15-2"><a href="#cb15-2"></a><span class="dv">4096</span></span>
<span id="cb15-3"><a href="#cb15-3"></a><span class="op">>>></span> XhoII.elucidate()</span>
<span id="cb15-4"><a href="#cb15-4"></a><span class="co">'R^GATC_Y'</span></span>
<span id="cb15-5"><a href="#cb15-5"></a><span class="op">>>></span> XhoII.frequency()</span>
<span id="cb15-6"><a href="#cb15-6"></a><span class="dv">1024</span></span></code></pre></div>
<p>To get the length of a the recognition sequence of an enzyme use the built-in function <code>len()</code>:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1"></a><span class="op">>>></span> <span class="bu">len</span>(EcoRI)</span>
<span id="cb16-2"><a href="#cb16-2"></a><span class="dv">6</span></span>
<span id="cb16-3"><a href="#cb16-3"></a><span class="op">>>></span> BstXI.elucidate()</span>
<span id="cb16-4"><a href="#cb16-4"></a><span class="co">'CCAN_NNNN^NTGG'</span></span>
<span id="cb16-5"><a href="#cb16-5"></a><span class="op">>>></span> <span class="bu">len</span>(BstXI)</span>
<span id="cb16-6"><a href="#cb16-6"></a><span class="dv">12</span></span>
<span id="cb16-7"><a href="#cb16-7"></a><span class="op">>>></span> FokI.site</span>
<span id="cb16-8"><a href="#cb16-8"></a><span class="co">'GGATG'</span></span>
<span id="cb16-9"><a href="#cb16-9"></a><span class="op">>>></span> FokI.elucidate() <span class="co"># FokI cut well outside its recognition site</span></span>
<span id="cb16-10"><a href="#cb16-10"></a><span class="co">'GGATGNNNNNNNNN^NNNN_N'</span></span>
<span id="cb16-11"><a href="#cb16-11"></a><span class="op">>>></span> <span class="bu">len</span>(FokI) <span class="co"># its length is the length of the recognition site</span></span>
<span id="cb16-12"><a href="#cb16-12"></a><span class="dv">5</span></span></code></pre></div>
<p>Also interesting are the methods dealing with isoschizomers. For memory, two enzymes are <em>isoschizomers</em> if they share a same recognition site. A further division is made between isoschizomers (same name, recognise the same sequence and cut the same way) and <em>neoschizomers</em> which cut at different positions. <em>Equischizomer</em> is an arbitrary choice to design “isoschizomers_that_are_not_neoschizomers” as this last one was a bit long. Another set of method <code>one_enzyme.is_*schizomers(one_other_enzyme)</code>, allow to test 2 enzymes against each other.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1"></a><span class="op">>>></span> Acc65I.isoschizomers()</span>
<span id="cb17-2"><a href="#cb17-2"></a>[Asp718I, KpnI]</span>
<span id="cb17-3"><a href="#cb17-3"></a><span class="op">>>></span> Acc65I.neoschizomers()</span>
<span id="cb17-4"><a href="#cb17-4"></a>[KpnI]</span>
<span id="cb17-5"><a href="#cb17-5"></a><span class="op">>>></span> Acc65I.equischizomers()</span>
<span id="cb17-6"><a href="#cb17-6"></a>[Asp718I]</span>
<span id="cb17-7"><a href="#cb17-7"></a><span class="op">>>></span> KpnI.elucidate()</span>
<span id="cb17-8"><a href="#cb17-8"></a><span class="co">'G_GTAC^C'</span></span>
<span id="cb17-9"><a href="#cb17-9"></a><span class="op">>>></span> Acc65I.elucidate()</span>
<span id="cb17-10"><a href="#cb17-10"></a><span class="co">'G^GTAC_C'</span></span>
<span id="cb17-11"><a href="#cb17-11"></a><span class="op">>>></span> KpnI.is_neoschizomer(Acc65I)</span>
<span id="cb17-12"><a href="#cb17-12"></a><span class="va">True</span></span>
<span id="cb17-13"><a href="#cb17-13"></a><span class="op">>>></span> KpnI.is_neoschizomer(KpnI)</span>
<span id="cb17-14"><a href="#cb17-14"></a><span class="va">False</span></span>
<span id="cb17-15"><a href="#cb17-15"></a><span class="op">>>></span> KpnI.is_isoschizomer(Acc65I)</span>
<span id="cb17-16"><a href="#cb17-16"></a><span class="va">True</span></span>
<span id="cb17-17"><a href="#cb17-17"></a><span class="op">>>></span> KpnI.is_isoschizomer(KpnI)</span>
<span id="cb17-18"><a href="#cb17-18"></a><span class="va">True</span></span>
<span id="cb17-19"><a href="#cb17-19"></a><span class="op">>>></span> KpnI.is_equischizomer(Acc65I)</span>
<span id="cb17-20"><a href="#cb17-20"></a><span class="va">False</span></span>
<span id="cb17-21"><a href="#cb17-21"></a><span class="op">>>></span> KpnI.is_equischizomer(KpnI)</span>
<span id="cb17-22"><a href="#cb17-22"></a><span class="va">True</span></span></code></pre></div>
<p><code>suppliers()</code> will get you the list of all the suppliers of the enzyme. <code>all_suppliers()</code> will give you all the suppliers in the database.</p>
<h3 id="the-restrictionbatch-class-a-class-to-deal-with-several-enzymes"><a name="2"></a>2. The RestrictionBatch class: a class to deal with several enzymes</h3>
<p>If you want to make a restriction map of a sequence, using individual enzymes can become tedious and will endures a big overhead due to the repetitive conversion of the sequence to a <code>FormattedSeq</code> (see <a href="#5">Chapter 5</a>). <code>Restriction</code> provides a class to make easier the use of large number of enzymes in one go: <code>RestrictionBatch</code>. <code>RestrictionBatch</code> will help you to manipulate lots of enzymes with a single command. Moreover all the enzymes in the restriction batch will share the same converted sequence, reducing the overhead.</p>
<h4 id="creating-a-restrictionbatch"><a name="2.1"></a><span class="mozTocH4"></span>2.1 Creating a RestrictionBatch</h4>
<p>You can initiate a restriction batch by passing it a list of enzymes or enzyme names as argument.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1"></a><span class="op">>>></span> rb <span class="op">=</span> RestrictionBatch([EcoRI])</span>
<span id="cb18-2"><a href="#cb18-2"></a><span class="op">>>></span> rb</span>
<span id="cb18-3"><a href="#cb18-3"></a>RestrictionBatch([<span class="st">'EcoRI'</span>])</span>
<span id="cb18-4"><a href="#cb18-4"></a><span class="op">>>></span> rb2 <span class="op">=</span> RestrictionBatch([<span class="st">'EcoRI'</span>])</span>
<span id="cb18-5"><a href="#cb18-5"></a><span class="op">>>></span> rb2</span>
<span id="cb18-6"><a href="#cb18-6"></a>RestrictionBatch([<span class="st">'EcoRI'</span>])</span>
<span id="cb18-7"><a href="#cb18-7"></a><span class="op">>>></span> rb <span class="op">==</span> rb2</span>
<span id="cb18-8"><a href="#cb18-8"></a><span class="va">True</span></span></code></pre></div>
<p>Adding a new enzyme to a restriction batch is easy:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1"></a><span class="op">>>></span> rb.add(KpnI)</span>
<span id="cb19-2"><a href="#cb19-2"></a><span class="op">>>></span> rb</span>
<span id="cb19-3"><a href="#cb19-3"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'KpnI'</span>])</span>
<span id="cb19-4"><a href="#cb19-4"></a><span class="op">>>></span> rb <span class="op">+=</span> EcoRV</span>
<span id="cb19-5"><a href="#cb19-5"></a><span class="op">>>></span> rb</span>
<span id="cb19-6"><a href="#cb19-6"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>])])</span></code></pre></div>
<p>Another way to create a RestrictionBatch is by simply adding restriction enzymes together, this is particularly useful for small batches:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb20-1"><a href="#cb20-1"></a><span class="op">>>></span> rb3 <span class="op">=</span> EcoRI <span class="op">+</span> KpnI <span class="op">+</span> EcoRV</span>
<span id="cb20-2"><a href="#cb20-2"></a><span class="op">>>></span> rb3</span>
<span id="cb20-3"><a href="#cb20-3"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>])</span></code></pre></div>
<h4 id="restricting-a-restrictionbatch-to-a-particular-supplier"><a name="2.2"></a>2.2 Restricting a RestrictionBatch to a particular supplier</h4>
<p>The Restriction package is based upon the <strong>REBASE</strong> database. This database gives a list of suppliers for each enzyme. It would be a shame not to make use of this facility. You can produce a <code>RestrictionBatch</code> containing only enzymes from one or a few supplier(s). Here is how to do it:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb21-1"><a href="#cb21-1"></a><span class="op">>>></span> rb_supp <span class="op">=</span> RestrictionBatch(first<span class="op">=</span>[], suppliers<span class="op">=</span>[<span class="st">'C'</span>,<span class="st">'B'</span>,<span class="st">'E'</span>,<span class="st">'I'</span>,<span class="st">'K'</span>,<span class="st">'J'</span>,<span class="st">'M'</span>,</span>
<span id="cb21-2"><a href="#cb21-2"></a><span class="st">'O'</span>,<span class="st">'N'</span>,<span class="st">'Q'</span>,<span class="st">'S'</span>,<span class="st">'R'</span>,<span class="st">'V'</span>,<span class="st">'Y'</span>,<span class="st">'X'</span>])</span>
<span id="cb21-3"><a href="#cb21-3"></a><span class="op">>>></span> <span class="co"># This will create a RestrictionBatch with the all enzymes which possess a s</span></span>
<span id="cb21-4"><a href="#cb21-4"></a>upplier.</span>
<span id="cb21-5"><a href="#cb21-5"></a><span class="op">>>></span> <span class="bu">len</span>(rb_supp) <span class="co"># May 2020</span></span>
<span id="cb21-6"><a href="#cb21-6"></a><span class="dv">621</span></span></code></pre></div>
<p>The argument <code>suppliers</code> take a list of one or several single letter codes corresponding to the supplier(s). The codes are the same as defined in REBASE. As it would be a pain to have to remember each supplier code, <code>RestrictionBatch</code> provides a method which show the pair code <=> supplier:</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb22-1"><a href="#cb22-1"></a><span class="op">>>></span> RestrictionBatch.show_codes() <span class="co"># as of May 2016 REBASE release.</span></span>
<span id="cb22-2"><a href="#cb22-2"></a>C <span class="op">=</span> Minotech Biotechnology</span>
<span id="cb22-3"><a href="#cb22-3"></a>B <span class="op">=</span> Life Technologies</span>
<span id="cb22-4"><a href="#cb22-4"></a>E <span class="op">=</span> Agilent Technologies</span>
<span id="cb22-5"><a href="#cb22-5"></a>I <span class="op">=</span> SibEnzyme Ltd.</span>
<span id="cb22-6"><a href="#cb22-6"></a>K <span class="op">=</span> Takara Bio Inc.</span>
<span id="cb22-7"><a href="#cb22-7"></a>J <span class="op">=</span> Nippon Gene Co., Ltd.</span>
<span id="cb22-8"><a href="#cb22-8"></a>M <span class="op">=</span> Roche Applied Science</span>
<span id="cb22-9"><a href="#cb22-9"></a>O <span class="op">=</span> Toyobo Biochemicals</span>
<span id="cb22-10"><a href="#cb22-10"></a>N <span class="op">=</span> New England Biolabs</span>
<span id="cb22-11"><a href="#cb22-11"></a>Q <span class="op">=</span> Molecular Biology Resources <span class="op">-</span> CHIMERx</span>
<span id="cb22-12"><a href="#cb22-12"></a>S <span class="op">=</span> Sigma Chemical Corporation</span>
<span id="cb22-13"><a href="#cb22-13"></a>R <span class="op">=</span> Promega Corporation</span>
<span id="cb22-14"><a href="#cb22-14"></a>V <span class="op">=</span> Vivantis Technologies</span>
<span id="cb22-15"><a href="#cb22-15"></a>Y <span class="op">=</span> SinaClon BioScience Co.</span>
<span id="cb22-16"><a href="#cb22-16"></a>X <span class="op">=</span> EURx Ltd.</span>
<span id="cb22-17"><a href="#cb22-17"></a><span class="op">>>></span> <span class="co"># You can now choose a code and built your RestrictionBatch</span></span></code></pre></div>
<p>This way of producing a <code>RestrictionBatch</code> can drastically reduce the amount of useless output from a restriction analysis, limiting the search to enzymes that you can get hold of and limiting the risks of nervous breakdown. Nothing is more frustrating than to get the perfect enzyme for a sub-cloning only to find it’s not commercially available.</p>
<h4 id="adding-enzymes-to-a-restrictionbatch"><a name="2.3"></a>2.3 Adding enzymes to a RestrictionBatch</h4>
<p>Adding an enzyme to a batch if the enzyme is already present will not raise an exception, but will have no effects. Sometimes you want to get an enzyme from a <code>RestrictionBatch</code> or add it to the batch if it is not present. You will use the <code>get</code> method setting the second argument <code>add</code> to <code>True</code>.</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb23-1"><a href="#cb23-1"></a><span class="op">>>></span> rb3</span>
<span id="cb23-2"><a href="#cb23-2"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>])</span>
<span id="cb23-3"><a href="#cb23-3"></a><span class="op">>>></span> rb3.add(EcoRI)</span>
<span id="cb23-4"><a href="#cb23-4"></a><span class="op">>>></span> rb3</span>
<span id="cb23-5"><a href="#cb23-5"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>])</span>
<span id="cb23-6"><a href="#cb23-6"></a><span class="op">>>></span> rb3.get(EcoRI)</span>
<span id="cb23-7"><a href="#cb23-7"></a>EcoRI</span>
<span id="cb23-8"><a href="#cb23-8"></a><span class="op">>>></span> rb3.get(SmaI)</span>
<span id="cb23-9"><a href="#cb23-9"></a></span>
<span id="cb23-10"><a href="#cb23-10"></a>Traceback (most recent call last):</span>
<span id="cb23-11"><a href="#cb23-11"></a> File <span class="st">"<pyshell#4>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op">-</span>toplevel<span class="op">-</span></span>
<span id="cb23-12"><a href="#cb23-12"></a> rb3.get(SmaI)</span>
<span id="cb23-13"><a href="#cb23-13"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1800</span>, <span class="kw">in</span> get</span>
<span id="cb23-14"><a href="#cb23-14"></a> <span class="cf">raise</span> <span class="pp">ValueError</span>, <span class="st">'enzyme </span><span class="sc">%s</span><span class="st"> is not in RestrictionBatch'</span><span class="op">%</span>e.<span class="va">__name__</span></span>
<span id="cb23-15"><a href="#cb23-15"></a><span class="pp">ValueError</span>: enzyme SmaI <span class="kw">is</span> <span class="kw">not</span> <span class="kw">in</span> RestrictionBatch</span>
<span id="cb23-16"><a href="#cb23-16"></a><span class="op">>>></span> rb3.get(SmaI, add<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb23-17"><a href="#cb23-17"></a>SmaI</span>
<span id="cb23-18"><a href="#cb23-18"></a><span class="op">>>></span> rb3</span>
<span id="cb23-19"><a href="#cb23-19"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>, <span class="st">'SmaI'</span>])</span></code></pre></div>
<h4 id="removing-enzymes-from-a-restrictionbatch"><a name="2.4"></a>2.4 Removing enzymes from a RestrictionBatch</h4>
<p>Removing enzymes from a batch is done using the <code>remove()</code> method. If the enzyme is not present in the batch this will raise a <code>KeyError</code>. If the value you want to remove is not an enzyme this will raise a <code>ValueError</code>.</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb24-1"><a href="#cb24-1"></a><span class="op">>>></span> rb3.remove(EcoRI)</span>
<span id="cb24-2"><a href="#cb24-2"></a><span class="op">>>></span> rb3</span>
<span id="cb24-3"><a href="#cb24-3"></a>RestrictionBatch([<span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>, <span class="st">'SmaI'</span>])</span>
<span id="cb24-4"><a href="#cb24-4"></a><span class="op">>>></span> rb3.remove(EcoRI)</span>
<span id="cb24-5"><a href="#cb24-5"></a></span>
<span id="cb24-6"><a href="#cb24-6"></a>Traceback (most recent call last):</span>
<span id="cb24-7"><a href="#cb24-7"></a> File <span class="st">"<pyshell#14>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op">-</span>toplevel<span class="op">-</span></span>
<span id="cb24-8"><a href="#cb24-8"></a> rb3.remove(<span class="st">'EcoRI'</span>)</span>
<span id="cb24-9"><a href="#cb24-9"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1839</span>, <span class="kw">in</span> remove</span>
<span id="cb24-10"><a href="#cb24-10"></a> <span class="cf">return</span> Set.remove(<span class="va">self</span>, <span class="va">self</span>.<span class="bu">format</span>(other))</span>
<span id="cb24-11"><a href="#cb24-11"></a> File <span class="st">"/usr/lib/Python2.3/sets.py"</span>, line <span class="dv">534</span>, <span class="kw">in</span> remove</span>
<span id="cb24-12"><a href="#cb24-12"></a> <span class="kw">del</span> <span class="va">self</span>._data[element]</span>
<span id="cb24-13"><a href="#cb24-13"></a><span class="pp">KeyError</span>: EcoRI</span>
<span id="cb24-14"><a href="#cb24-14"></a><span class="op">>>></span> rb3 <span class="op">+=</span> EcoRI</span>
<span id="cb24-15"><a href="#cb24-15"></a><span class="op">>>></span> rb3</span>
<span id="cb24-16"><a href="#cb24-16"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>, <span class="st">'SmaI'</span>])</span>
<span id="cb24-17"><a href="#cb24-17"></a><span class="op">>>></span> rb3.remove(<span class="st">'EcoRI'</span>)</span>
<span id="cb24-18"><a href="#cb24-18"></a><span class="op">>>></span> rb3</span>
<span id="cb24-19"><a href="#cb24-19"></a>RestrictionBatch([<span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>, <span class="st">'SmaI'</span>])</span>
<span id="cb24-20"><a href="#cb24-20"></a><span class="op">>>></span> rb3.remove(<span class="st">'spam'</span>)</span>
<span id="cb24-21"><a href="#cb24-21"></a></span>
<span id="cb24-22"><a href="#cb24-22"></a>Traceback (most recent call last):</span>
<span id="cb24-23"><a href="#cb24-23"></a> File <span class="st">"<pyshell#18>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op">-</span>toplevel<span class="op">-</span></span>
<span id="cb24-24"><a href="#cb24-24"></a> rb3.remove(<span class="st">'spam'</span>)</span>
<span id="cb24-25"><a href="#cb24-25"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1839</span>, <span class="kw">in</span> remove</span>
<span id="cb24-26"><a href="#cb24-26"></a> <span class="cf">return</span> Set.remove(<span class="va">self</span>, <span class="va">self</span>.<span class="bu">format</span>(other))</span>
<span id="cb24-27"><a href="#cb24-27"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1871</span>, <span class="kw">in</span> <span class="bu">format</span></span>
<span id="cb24-28"><a href="#cb24-28"></a> <span class="cf">raise</span> <span class="pp">ValueError</span>, <span class="st">'</span><span class="sc">%s</span><span class="st"> is not a RestrictionType'</span><span class="op">%</span>y.__class__</span>
<span id="cb24-29"><a href="#cb24-29"></a><span class="pp">ValueError</span>: <span class="op"><</span><span class="bu">type</span> <span class="st">'str'</span><span class="op">></span> <span class="kw">is</span> <span class="kw">not</span> a RestrictionType</span></code></pre></div>
<h4 id="manipulating-restrictionbatch"><a name="2.5"></a>2.5 Manipulating RestrictionBatch</h4>
<p>You can not, however, add batches together, as they are Python <code>sets</code>. You must use the pipe operator <code>|</code> instead. You can find the intersection between 2 batches using <code>&</code> (see the Python documentation about <code>sets</code> for more information.</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb25-1"><a href="#cb25-1"></a><span class="op">>>></span> rb3 <span class="op">=</span> EcoRI <span class="op">+</span> KpnI <span class="op">+</span> EcoRV</span>
<span id="cb25-2"><a href="#cb25-2"></a><span class="op">>>></span> rb3</span>
<span id="cb25-3"><a href="#cb25-3"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>])</span>
<span id="cb25-4"><a href="#cb25-4"></a><span class="op">>>></span> rb4 <span class="op">=</span> SmaI <span class="op">+</span> PstI</span>
<span id="cb25-5"><a href="#cb25-5"></a><span class="op">>>></span> rb4</span>
<span id="cb25-6"><a href="#cb25-6"></a>RestrictionBatch([<span class="st">'PstI'</span>, <span class="st">'SmaI'</span>])</span>
<span id="cb25-7"><a href="#cb25-7"></a><span class="op">>>></span> rb3 <span class="op">+</span> rb4</span>
<span id="cb25-8"><a href="#cb25-8"></a></span>
<span id="cb25-9"><a href="#cb25-9"></a>Traceback (most recent call last):</span>
<span id="cb25-10"><a href="#cb25-10"></a> File <span class="st">"<pyshell#23>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op">-</span>toplevel<span class="op">-</span></span>
<span id="cb25-11"><a href="#cb25-11"></a> rb3 <span class="op">+</span> rb4</span>
<span id="cb25-12"><a href="#cb25-12"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1829</span>, <span class="kw">in</span> <span class="fu">__add__</span></span>
<span id="cb25-13"><a href="#cb25-13"></a> new.add(other)</span>
<span id="cb25-14"><a href="#cb25-14"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1848</span>, <span class="kw">in</span> add</span>
<span id="cb25-15"><a href="#cb25-15"></a> <span class="cf">return</span> Set.add(<span class="va">self</span>, <span class="va">self</span>.<span class="bu">format</span>(other))</span>
<span id="cb25-16"><a href="#cb25-16"></a> File <span class="st">"/usr/lib/Python2.3/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">1871</span>, <span class="kw">in</span> <span class="bu">format</span></span>
<span id="cb25-17"><a href="#cb25-17"></a> <span class="cf">raise</span> <span class="pp">ValueError</span>, <span class="st">'</span><span class="sc">%s</span><span class="st"> is not a RestrictionType'</span><span class="op">%</span>y.__class__</span>
<span id="cb25-18"><a href="#cb25-18"></a><span class="pp">ValueError</span>: <span class="op"><</span><span class="kw">class</span> <span class="st">'Bio.Restriction.Restriction.RestrictionBatch'</span><span class="op">></span> <span class="kw">is</span> <span class="kw">not</span> a RestrictionType</span>
<span id="cb25-19"><a href="#cb25-19"></a><span class="op">>>></span> rb3 <span class="op">|</span> rb4</span>
<span id="cb25-20"><a href="#cb25-20"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>, <span class="st">'PstI'</span>, <span class="st">'SmaI'</span>])</span>
<span id="cb25-21"><a href="#cb25-21"></a><span class="op">>>></span> rb3 <span class="op">&</span> rb4</span>
<span id="cb25-22"><a href="#cb25-22"></a>RestrictionBatch([])</span>
<span id="cb25-23"><a href="#cb25-23"></a><span class="op">>>></span> rb4 <span class="op">+=</span> EcoRI</span>
<span id="cb25-24"><a href="#cb25-24"></a><span class="op">>>></span> rb4</span>
<span id="cb25-25"><a href="#cb25-25"></a>RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'PstI'</span>, <span class="st">'SmaI'</span>])</span>
<span id="cb25-26"><a href="#cb25-26"></a><span class="op">>>></span> rb3 <span class="op">&</span> rb4</span>
<span id="cb25-27"><a href="#cb25-27"></a>RestrictionBatch([<span class="st">'EcoRI'</span>])</span></code></pre></div>
<h4 id="analysing-sequences-with-a-restrictionbatch"><a name="2.6"></a>2.6 Analysing sequences with a RestrictionBatch</h4>
<p>To analyse a sequence for potential site, you can use the <code>search</code> method of the batch, the same way you did for restriction enzymes. The results is no longer a list however, but a dictionary. The keys of the dictionary are the names of the enzymes and the value a list of position site. <code>RestrictionBatch</code> does not implement a <code>catalyse</code> method, as it would not have a real meaning when used with large batch.</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb26-1"><a href="#cb26-1"></a><span class="op">>>></span> new_seq <span class="op">=</span> Seq(<span class="st">'TTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAA'</span>)</span>
<span id="cb26-2"><a href="#cb26-2"></a><span class="op">>>></span> rb.search(new_seq)</span>
<span id="cb26-3"><a href="#cb26-3"></a>{<span class="st">'KpnI'</span>: [], <span class="st">'EcoRV'</span>: [], <span class="st">'EcoRI'</span>: []}</span>
<span id="cb26-4"><a href="#cb26-4"></a><span class="op">>>></span> rb.search(new_seq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb26-5"><a href="#cb26-5"></a>{<span class="st">'KpnI'</span>: [], <span class="st">'EcoRV'</span>: [], <span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span></code></pre></div>
<h4 id="other-restrictionbatch-methods"><a name="2.7"></a>2.7 Other RestrictionBatch methods</h4>
<p>Amongst the other methods provided by <code>RestrictionBatch</code>, <code>elements()</code> which return a list of all the element names alphabetically sorted, is certainly the most useful.</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb27-1"><a href="#cb27-1"></a><span class="op">>>></span> rb <span class="op">=</span> EcoRI <span class="op">+</span> KpnI <span class="op">+</span> EcoRV</span>
<span id="cb27-2"><a href="#cb27-2"></a><span class="op">>>></span> rb.elements()</span>
<span id="cb27-3"><a href="#cb27-3"></a>[<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>]</span></code></pre></div>
<p>If you don’t care about the alphabetical order use the method <code>as_string()</code>, to get the same thing a bit faster. The list is not sorted. The order is random as Python sets are dictionary.</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb28-1"><a href="#cb28-1"></a><span class="op">>>></span> rb <span class="op">=</span> EcoRI <span class="op">+</span> KpnI <span class="op">+</span> EcoRV</span>
<span id="cb28-2"><a href="#cb28-2"></a><span class="op">>>></span> rb.as_string()</span>
<span id="cb28-3"><a href="#cb28-3"></a>[<span class="st">'EcoRI'</span>, <span class="st">'KpnI'</span>, <span class="st">'EcoRV'</span>]</span></code></pre></div>
<p>Other <code>RestrictionBatch</code> methods are generally used for particular purposes and will not be discussed here. See the <a href="https://github.com/biopython/biopython/tree/master/Bio/Restriction">source</a> if you are interested.</p>
<h3 id="allenzymes-and-commonly-two-preconfigured-restrictionbatches"><a name="3"></a>3. AllEnzymes and CommOnly: two preconfigured RestrictionBatches</h3>
<p>While it is sometime practical to produce a <code>RestrictionBatch</code> of your own you will certainly more frequently use the two batches provided with the <code>Restriction</code> packages: <code>AllEnzymes</code> and <code>CommOnly</code>. These two batches contain respectively all the enzymes in the database and only the enzymes which have a commercial supplier. They are rather big, but that’s what make them useful. With these batch you can produce a full description of a sequence with a single command. You can use these two batch as any other batch.</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb29-1"><a href="#cb29-1"></a><span class="op">>>></span> <span class="bu">len</span>(AllEnzymes)</span>
<span id="cb29-2"><a href="#cb29-2"></a><span class="dv">778</span></span>
<span id="cb29-3"><a href="#cb29-3"></a><span class="op">>>></span> <span class="bu">len</span>(CommOnly)</span>
<span id="cb29-4"><a href="#cb29-4"></a><span class="dv">622</span></span>
<span id="cb29-5"><a href="#cb29-5"></a><span class="op">>>></span> AllEnzymes.search(new_seq) ...</span></code></pre></div>
<p>There is not a lot to say about them apart the fact that they are present. They are really normal batches, and you can use them as any other batch.</p>
<h3 id="the-analysis-class-even-simpler-restriction-analysis"><a name="4"></a>4. The Analysis class: even simpler restriction analysis</h3>
<p><code>RestrictionBatch</code> can give you a dictionary with the sites for all the enzymes in a batch. However, it is sometime nice to get something a bit easier to read than a Python dictionary. Complex restriction analysis are not easy with <code>RestrictionBatch</code>. Some refinements in the way to search a sequence for restriction sites will help. <code>Analysis</code> provides a serie of command to customise the results obtained from a pair restriction batch/sequence and some facilities to make the output sligthly more human readable.</p>
<h4 id="setting-up-an-analysis"><a name="4.1"></a>4.1 Setting up an Analysis</h4>
<p>To build a restriction analysis you will need a <code>RestrictionBatch</code> and a sequence and to tell it if the sequence is linear or circular. The first argument <code>Analysis</code> takes is the restriction batch, the second is the sequence. If the third argument is not provided, <code>Analysis</code> will assume the sequence is linear.</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb30-1"><a href="#cb30-1"></a><span class="op">>>></span> new_seq <span class="op">=</span> Seq(<span class="st">'TTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAA'</span>)</span>
<span id="cb30-2"><a href="#cb30-2"></a><span class="op">>>></span> rb <span class="op">=</span> RestrictionBatch([EcoRI, KpnI, EcoRV])</span>
<span id="cb30-3"><a href="#cb30-3"></a><span class="op">>>></span> Ana <span class="op">=</span> Analysis(rb, new_seq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb30-4"><a href="#cb30-4"></a><span class="op">>>></span> Ana</span>
<span id="cb30-5"><a href="#cb30-5"></a>Analysis(RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>]),Seq(<span class="st">'TTCAAAAAAAAAAAAAAAAAA</span></span>
<span id="cb30-6"><a href="#cb30-6"></a><span class="st">AAAAAAAAAAGAA'</span>),<span class="va">False</span>)</span></code></pre></div>
<h4 id="full-restriction-analysis"><a name="4.2"></a>4.2 Full restriction analysis</h4>
<p>Once you have created your new <code>Analysis</code>, you can use it to get a restriction analysis of your sequence. The way to make a full restriction analysis of the sequence is:</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb31-1"><a href="#cb31-1"></a><span class="op">>>></span> Ana.full()</span>
<span id="cb31-2"><a href="#cb31-2"></a>{<span class="st">'KpnI'</span>: [], <span class="st">'EcoRV'</span>: [], <span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span></code></pre></div>
<p>This is much the same as the output of a <code>RestrictionBatch.search</code> method. You will get a more easy to read output with <code>print_that</code> used without argument:</p>
<div class="sourceCode" id="cb32"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb32-1"><a href="#cb32-1"></a><span class="op">>>></span> <span class="co"># let's create a something a bit more complex to analyse.</span></span>
<span id="cb32-2"><a href="#cb32-2"></a><span class="op">>>></span></span>
<span id="cb32-3"><a href="#cb32-3"></a><span class="op">>>></span> rb <span class="op">=</span> RestrictionBatch([], [<span class="st">'C'</span>]) <span class="co"># we will explain the meaning of the</span></span>
<span id="cb32-4"><a href="#cb32-4"></a><span class="op">>>></span> <span class="co"># double list argument later.</span></span>
<span id="cb32-5"><a href="#cb32-5"></a><span class="op">>>></span></span>
<span id="cb32-6"><a href="#cb32-6"></a><span class="op">>>></span> multi_site <span class="op">=</span> Seq.Seq(<span class="st">'AAA'</span> <span class="op">+</span> EcoRI.site <span class="op">+</span> <span class="st">'G'</span> <span class="op">+</span> KpnI.site <span class="op">+</span> EcoRV.site <span class="op">+</span></span>
<span id="cb32-7"><a href="#cb32-7"></a> <span class="st">'CT'</span> <span class="op">+</span> SmaI.site <span class="op">+</span> <span class="st">'GT'</span> <span class="op">+</span> FokI.site <span class="op">+</span> <span class="st">'GAAAGGGC'</span> <span class="op">+</span></span>
<span id="cb32-8"><a href="#cb32-8"></a> EcoRI.site <span class="op">+</span> <span class="st">'ACGT'</span>)</span>
<span id="cb32-9"><a href="#cb32-9"></a><span class="op">>>></span> Analong <span class="op">=</span> Analysis(rb, multi_site)</span>
<span id="cb32-10"><a href="#cb32-10"></a><span class="op">>>></span> Analong.full()</span>
<span id="cb32-11"><a href="#cb32-11"></a>{BglI: [], BstEII: [], AsuII: [], HinfI: [], SfiI: [], PspPI: [], BsiSI: [<span class="dv">27</span>], S</span>
<span id="cb32-12"><a href="#cb32-12"></a>alI: [], SlaI: [], NcoI: [], NotI: [], PstI: [], StyI: [], BseBI: [], PvuII: [],</span>
<span id="cb32-13"><a href="#cb32-13"></a>HindIII: [], BglII: [], ApaLI: [], TaqI: [], BssAI: [], AluI: [], SstI: [], Bse</span>
<span id="cb32-14"><a href="#cb32-14"></a>CI: [], Sau3AI: [], HpaI: [], SnaBI: [], NheI: [], BclI: [], KpnI: [<span class="dv">16</span>], NruI: [</span>
<span id="cb32-15"><a href="#cb32-15"></a>], MspCI: [], BshFI: [], CspAI: [], RsaI: [<span class="dv">14</span>], EcoRV: [<span class="dv">20</span>], SphI: [], BamHI: []</span>
<span id="cb32-16"><a href="#cb32-16"></a>, MboI: [], SgrBI: [], SspI: [], ScaI: [], XbaI: [], SseBI: [], NaeI: [], EcoRI:</span>
<span id="cb32-17"><a href="#cb32-17"></a>[<span class="dv">5</span>, <span class="dv">47</span>], SmaI: [<span class="dv">28</span>], BseAI: []}</span>
<span id="cb32-18"><a href="#cb32-18"></a><span class="op">>>></span></span>
<span id="cb32-19"><a href="#cb32-19"></a><span class="op">>>></span> <span class="co"># The results are here but it is difficult to read. let's try print_that</span></span>
<span id="cb32-20"><a href="#cb32-20"></a><span class="op">>>></span></span>
<span id="cb32-21"><a href="#cb32-21"></a><span class="op">>>></span> Analong.print_that()</span>
<span id="cb32-22"><a href="#cb32-22"></a></span>
<span id="cb32-23"><a href="#cb32-23"></a>BsiSI : <span class="fl">27.</span></span>
<span id="cb32-24"><a href="#cb32-24"></a>RsaI : <span class="fl">14.</span></span>
<span id="cb32-25"><a href="#cb32-25"></a>EcoRI : <span class="dv">5</span>, <span class="fl">47.</span></span>
<span id="cb32-26"><a href="#cb32-26"></a>EcoRV : <span class="fl">20.</span></span>
<span id="cb32-27"><a href="#cb32-27"></a>KpnI : <span class="fl">16.</span></span>
<span id="cb32-28"><a href="#cb32-28"></a>SmaI : <span class="fl">28.</span></span>
<span id="cb32-29"><a href="#cb32-29"></a></span>
<span id="cb32-30"><a href="#cb32-30"></a> Enzymes which do <span class="kw">not</span> cut the sequence.</span>
<span id="cb32-31"><a href="#cb32-31"></a></span>
<span id="cb32-32"><a href="#cb32-32"></a>AluI BshFI MboI Sau3AI TaqI BseBI HinfI PspPI</span>
<span id="cb32-33"><a href="#cb32-33"></a>ApaLI AsuII BamHI BclI BglII BseAI BseCI BssAI</span>
<span id="cb32-34"><a href="#cb32-34"></a>CspAI HindIII HpaI MspCI NaeI NcoI NheI NruI</span>
<span id="cb32-35"><a href="#cb32-35"></a>PstI PvuII SalI ScaI SgrBI SlaI SnaBI SphI</span>
<span id="cb32-36"><a href="#cb32-36"></a>SseBI SspI SstI StyI XbaI BstEII NotI BglI</span>
<span id="cb32-37"><a href="#cb32-37"></a>SfiI</span></code></pre></div>
<p>Much clearer, is’nt ? The output is optimised for a shell 80 columns wide. If the output seems odd, check that the width of your shell is at least 80 columns.</p>
<h4 id="changing-the-title"><a name="4.3"></a>4.3 Changing the title</h4>
<p>You can provide a title to the analysis and modify the sentence ‘Enzymes which do not cut the sequence’, by setting the two optional arguments of <code>print_that</code>, <code>title</code> and <code>s1</code>. No formatting will be done on these strings so if you have to include the newline (<code>\n</code>) as you see fit:</p>
<div class="sourceCode" id="cb33"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb33-1"><a href="#cb33-1"></a><span class="op">>>></span> Analong.print_that(<span class="va">None</span>, title<span class="op">=</span><span class="st">'sequence = multi_site</span><span class="ch">\n\n</span><span class="st">'</span>)</span>
<span id="cb33-2"><a href="#cb33-2"></a></span>
<span id="cb33-3"><a href="#cb33-3"></a>sequence <span class="op">=</span> multi_site</span>
<span id="cb33-4"><a href="#cb33-4"></a></span>
<span id="cb33-5"><a href="#cb33-5"></a>BsiSI : <span class="fl">27.</span></span>
<span id="cb33-6"><a href="#cb33-6"></a>RsaI : <span class="fl">14.</span></span>
<span id="cb33-7"><a href="#cb33-7"></a>EcoRI : <span class="dv">5</span>, <span class="fl">47.</span></span>
<span id="cb33-8"><a href="#cb33-8"></a>EcoRV : <span class="fl">20.</span></span>
<span id="cb33-9"><a href="#cb33-9"></a>KpnI : <span class="fl">16.</span></span>
<span id="cb33-10"><a href="#cb33-10"></a>SmaI : <span class="fl">28.</span></span>
<span id="cb33-11"><a href="#cb33-11"></a></span>
<span id="cb33-12"><a href="#cb33-12"></a> Enzymes which do <span class="kw">not</span> cut the sequence.</span>
<span id="cb33-13"><a href="#cb33-13"></a></span>
<span id="cb33-14"><a href="#cb33-14"></a>AluI BshFI MboI Sau3AI TaqI BseBI HinfI PspPI</span>
<span id="cb33-15"><a href="#cb33-15"></a>ApaLI AsuII BamHI BclI BglII BseAI BseCI BssAI</span>
<span id="cb33-16"><a href="#cb33-16"></a>CspAI HindIII HpaI MspCI NaeI NcoI NheI NruI</span>
<span id="cb33-17"><a href="#cb33-17"></a>PstI PvuII SalI ScaI SgrBI SlaI SnaBI SphI</span>
<span id="cb33-18"><a href="#cb33-18"></a>SseBI SspI SstI StyI XbaI BstEII NotI BglI</span>
<span id="cb33-19"><a href="#cb33-19"></a>SfiI</span>
<span id="cb33-20"><a href="#cb33-20"></a></span>
<span id="cb33-21"><a href="#cb33-21"></a><span class="op">>>></span> Analong.print_that(<span class="va">None</span>, title<span class="op">=</span><span class="st">'sequence = multi_site</span><span class="ch">\n\n</span><span class="st">'</span>,</span>
<span id="cb33-22"><a href="#cb33-22"></a> s1<span class="op">=</span><span class="st">'</span><span class="ch">\n</span><span class="st"> no site:</span><span class="ch">\n\n</span><span class="st">'</span>)</span>
<span id="cb33-23"><a href="#cb33-23"></a></span>
<span id="cb33-24"><a href="#cb33-24"></a>sequence <span class="op">=</span> multi_site</span>
<span id="cb33-25"><a href="#cb33-25"></a></span>
<span id="cb33-26"><a href="#cb33-26"></a>BsiSI : <span class="fl">27.</span></span>
<span id="cb33-27"><a href="#cb33-27"></a>RsaI : <span class="fl">14.</span></span>
<span id="cb33-28"><a href="#cb33-28"></a>EcoRI : <span class="dv">5</span>, <span class="fl">47.</span></span>
<span id="cb33-29"><a href="#cb33-29"></a>EcoRV : <span class="fl">20.</span></span>
<span id="cb33-30"><a href="#cb33-30"></a>KpnI : <span class="fl">16.</span></span>
<span id="cb33-31"><a href="#cb33-31"></a>SmaI : <span class="fl">28.</span></span>
<span id="cb33-32"><a href="#cb33-32"></a></span>
<span id="cb33-33"><a href="#cb33-33"></a> no site:</span>
<span id="cb33-34"><a href="#cb33-34"></a></span>
<span id="cb33-35"><a href="#cb33-35"></a>AluI BshFI MboI Sau3AI TaqI BseBI HinfI PspPI</span>
<span id="cb33-36"><a href="#cb33-36"></a>ApaLI AsuII BamHI BclI BglII BseAI BseCI BssAI</span>
<span id="cb33-37"><a href="#cb33-37"></a>CspAI HindIII HpaI MspCI NaeI NcoI NheI NruI</span>
<span id="cb33-38"><a href="#cb33-38"></a>PstI PvuII SalI ScaI SgrBI SlaI SnaBI SphI</span>
<span id="cb33-39"><a href="#cb33-39"></a>SseBI SspI SstI StyI XbaI BstEII NotI BglI</span>
<span id="cb33-40"><a href="#cb33-40"></a>SfiI</span></code></pre></div>
<h4 id="customising-the-output"><a name="4.4"></a>4.4 Customising the output</h4>
<p>You can modify some aspects of the output interactively. There is three main type of output, two listing types (alphabetically sorted and sorted by number of site) and map-like type. To change the output, use the method <code>print_as()</code> of <code>Analysis</code>. The change of output is permanent for the instance of <code>Analysis</code> (that is until the next time you use <code>print_as()</code>). The argument of <code>print_as()</code> are strings: <code>'map'</code>, <code>'number'</code> or <code>'alpha'</code>. As you have seen previously the default behaviour is an alphabetical list (<code>'alpha'</code>).</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb34-1"><a href="#cb34-1"></a><span class="op">>>></span> Analong.print_as(<span class="st">'map'</span>)</span>
<span id="cb34-2"><a href="#cb34-2"></a><span class="op">>>></span> Analong.print_that()</span>
<span id="cb34-3"><a href="#cb34-3"></a></span>
<span id="cb34-4"><a href="#cb34-4"></a> <span class="dv">5</span> EcoRI</span>
<span id="cb34-5"><a href="#cb34-5"></a> <span class="op">|</span></span>
<span id="cb34-6"><a href="#cb34-6"></a> <span class="op">|</span> <span class="dv">14</span> RsaI</span>
<span id="cb34-7"><a href="#cb34-7"></a> <span class="op">|</span> <span class="op">|</span></span>
<span id="cb34-8"><a href="#cb34-8"></a> <span class="op">|</span> <span class="op">|</span> <span class="dv">16</span> KpnI</span>
<span id="cb34-9"><a href="#cb34-9"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span></span>
<span id="cb34-10"><a href="#cb34-10"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="dv">20</span> EcoRV</span>
<span id="cb34-11"><a href="#cb34-11"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span></span>
<span id="cb34-12"><a href="#cb34-12"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="dv">27</span> BsiSI</span>
<span id="cb34-13"><a href="#cb34-13"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span></span>
<span id="cb34-14"><a href="#cb34-14"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span><span class="dv">28</span> SmaI</span>
<span id="cb34-15"><a href="#cb34-15"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">||</span></span>
<span id="cb34-16"><a href="#cb34-16"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">||</span> <span class="dv">47</span> EcoRI</span>
<span id="cb34-17"><a href="#cb34-17"></a> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">|</span> <span class="op">||</span> <span class="op">|</span></span>
<span id="cb34-18"><a href="#cb34-18"></a>AAAGAATTCGGGTACCGATATCCTCCCGGGGTGGATGGAAAGGGCGAATTCACGT</span>
<span id="cb34-19"><a href="#cb34-19"></a><span class="op">|||||||||||||||||||||||||||||||||||||||||||||||||||||||</span></span>
<span id="cb34-20"><a href="#cb34-20"></a>TTTCTTAAGCCCATGGCTATAGGAGGGCCCCACCTACCTTTCCCGCTTAAGTGCA</span>
<span id="cb34-21"><a href="#cb34-21"></a><span class="dv">1</span> <span class="dv">55</span></span>
<span id="cb34-22"><a href="#cb34-22"></a></span>
<span id="cb34-23"><a href="#cb34-23"></a></span>
<span id="cb34-24"><a href="#cb34-24"></a> Enzymes which do <span class="kw">not</span> cut the sequence.</span>
<span id="cb34-25"><a href="#cb34-25"></a></span>
<span id="cb34-26"><a href="#cb34-26"></a>AluI BshFI MboI Sau3AI TaqI BseBI HinfI PspPI</span>
<span id="cb34-27"><a href="#cb34-27"></a>ApaLI AsuII BamHI BclI BglII BseAI BseCI BssAI</span>
<span id="cb34-28"><a href="#cb34-28"></a>CspAI HindIII HpaI MspCI NaeI NcoI NheI NruI</span>
<span id="cb34-29"><a href="#cb34-29"></a>PstI PvuII SalI ScaI SgrBI SlaI SnaBI SphI</span>
<span id="cb34-30"><a href="#cb34-30"></a>SseBI SspI SstI StyI XbaI BstEII NotI BglI</span>
<span id="cb34-31"><a href="#cb34-31"></a>SfiI</span>
<span id="cb34-32"><a href="#cb34-32"></a></span>
<span id="cb34-33"><a href="#cb34-33"></a><span class="op">>>></span> Analong.print_as(<span class="st">'number'</span>)</span>
<span id="cb34-34"><a href="#cb34-34"></a><span class="op">>>></span> Analong.print_that()</span>
<span id="cb34-35"><a href="#cb34-35"></a></span>
<span id="cb34-36"><a href="#cb34-36"></a></span>
<span id="cb34-37"><a href="#cb34-37"></a></span>
<span id="cb34-38"><a href="#cb34-38"></a>enzymes which cut <span class="dv">1</span> times :</span>
<span id="cb34-39"><a href="#cb34-39"></a></span>
<span id="cb34-40"><a href="#cb34-40"></a>BsiSI : <span class="fl">27.</span></span>
<span id="cb34-41"><a href="#cb34-41"></a>RsaI : <span class="fl">14.</span></span>
<span id="cb34-42"><a href="#cb34-42"></a>EcoRV : <span class="fl">20.</span></span>
<span id="cb34-43"><a href="#cb34-43"></a>KpnI : <span class="fl">16.</span></span>
<span id="cb34-44"><a href="#cb34-44"></a>SmaI : <span class="fl">28.</span></span>
<span id="cb34-45"><a href="#cb34-45"></a></span>
<span id="cb34-46"><a href="#cb34-46"></a></span>
<span id="cb34-47"><a href="#cb34-47"></a>enzymes which cut <span class="dv">2</span> times :</span>
<span id="cb34-48"><a href="#cb34-48"></a></span>
<span id="cb34-49"><a href="#cb34-49"></a>EcoRI : <span class="dv">5</span>, <span class="fl">47.</span></span>
<span id="cb34-50"><a href="#cb34-50"></a></span>
<span id="cb34-51"><a href="#cb34-51"></a> Enzymes which do <span class="kw">not</span> cut the sequence.</span>
<span id="cb34-52"><a href="#cb34-52"></a></span>
<span id="cb34-53"><a href="#cb34-53"></a>AluI BshFI MboI Sau3AI TaqI BseBI HinfI PspPI</span>
<span id="cb34-54"><a href="#cb34-54"></a>ApaLI AsuII BamHI BclI BglII BseAI BseCI BssAI</span>
<span id="cb34-55"><a href="#cb34-55"></a>CspAI HindIII HpaI MspCI NaeI NcoI NheI NruI</span>
<span id="cb34-56"><a href="#cb34-56"></a>PstI PvuII SalI ScaI SgrBI SlaI SnaBI SphI</span>
<span id="cb34-57"><a href="#cb34-57"></a>SseBI SspI SstI StyI XbaI BstEII NotI BglI</span>
<span id="cb34-58"><a href="#cb34-58"></a>SfiI</span>
<span id="cb34-59"><a href="#cb34-59"></a></span>
<span id="cb34-60"><a href="#cb34-60"></a><span class="op">>>></span></span></code></pre></div>
<p>To come back to the previous behaviour:</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb35-1"><a href="#cb35-1"></a><span class="op">>>></span> Analong.print_as(<span class="st">'alpha'</span>)</span>
<span id="cb35-2"><a href="#cb35-2"></a><span class="op">>>></span> Analong.print_that()</span>
<span id="cb35-3"><a href="#cb35-3"></a></span>
<span id="cb35-4"><a href="#cb35-4"></a>BsiSI : <span class="fl">27.</span></span>
<span id="cb35-5"><a href="#cb35-5"></a>RsaI : <span class="fl">14.</span></span>
<span id="cb35-6"><a href="#cb35-6"></a>EcoRI : <span class="dv">5</span>, <span class="fl">47.</span></span>
<span id="cb35-7"><a href="#cb35-7"></a>EcoRV : <span class="fl">20.</span></span>
<span id="cb35-8"><a href="#cb35-8"></a>etc ...</span></code></pre></div>
<h4 id="fancier-restriction-analysis"><a name="4.5"></a>4.5 Fancier restriction analysis</h4>
<p>I will not go into the detail for each single method, here are all the functions that are available. Most are perfectly self explanatory and the others are fairly well documented (use <code>help('Analysis.command_name')</code>). The methods are:</p>
<div class="sourceCode" id="cb36"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb36-1"><a href="#cb36-1"></a>full(<span class="va">self</span>,linear<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb36-2"><a href="#cb36-2"></a>blunt(<span class="va">self</span>,dct <span class="op">=</span> <span class="va">None</span>)</span>
<span id="cb36-3"><a href="#cb36-3"></a>overhang5(<span class="va">self</span>, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-4"><a href="#cb36-4"></a>overhang3(<span class="va">self</span>, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-5"><a href="#cb36-5"></a>defined(<span class="va">self</span>,dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-6"><a href="#cb36-6"></a>with_sites(<span class="va">self</span>, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-7"><a href="#cb36-7"></a>without_site(<span class="va">self</span>, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-8"><a href="#cb36-8"></a>with_N_sites(<span class="va">self</span>, N, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-9"><a href="#cb36-9"></a>with_number_list(<span class="va">self</span>, <span class="bu">list</span>, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-10"><a href="#cb36-10"></a>with_name(<span class="va">self</span>, names, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-11"><a href="#cb36-11"></a>with_site_size(<span class="va">self</span>, site_size, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-12"><a href="#cb36-12"></a>only_between(<span class="va">self</span>, start, end, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-13"><a href="#cb36-13"></a>between(<span class="va">self</span>,start, end, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-14"><a href="#cb36-14"></a>show_only_between(<span class="va">self</span>, start, end, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-15"><a href="#cb36-15"></a>only_outside(<span class="va">self</span>, start, end, dct <span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-16"><a href="#cb36-16"></a>outside(<span class="va">self</span>, start, end, dct<span class="op">=</span><span class="va">None</span>)</span>
<span id="cb36-17"><a href="#cb36-17"></a>do_not_cut(<span class="va">self</span>, start, end, dct <span class="op">=</span><span class="va">None</span>)</span></code></pre></div>
<p>Using these methods is simple:</p>
<div class="sourceCode" id="cb37"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb37-1"><a href="#cb37-1"></a><span class="op">>>></span> new_seq <span class="op">=</span> Seq(<span class="st">'TTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAA'</span>)</span>
<span id="cb37-2"><a href="#cb37-2"></a><span class="op">>>></span> rb <span class="op">=</span> RestrictionBatch([EcoRI, KpnI, EcoRV])</span>
<span id="cb37-3"><a href="#cb37-3"></a><span class="op">>>></span> Ana <span class="op">=</span> Analysis(rb, new_seq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb37-4"><a href="#cb37-4"></a><span class="op">>>></span> Ana</span>
<span id="cb37-5"><a href="#cb37-5"></a>Analysis(RestrictionBatch([<span class="st">'EcoRI'</span>, <span class="st">'EcoRV'</span>, <span class="st">'KpnI'</span>]),Seq(<span class="st">'TTCAAAAAAAAAAAAAAAAAA</span></span>
<span id="cb37-6"><a href="#cb37-6"></a><span class="st">AAAAAAAAAAGAA'</span>),<span class="va">False</span>)</span>
<span id="cb37-7"><a href="#cb37-7"></a><span class="op">>>></span> Ana.blunt() <span class="co"># output only the result for enzymes which cut blunt</span></span>
<span id="cb37-8"><a href="#cb37-8"></a>{<span class="st">'EcoRV'</span>: []}</span>
<span id="cb37-9"><a href="#cb37-9"></a><span class="op">>>></span> Ana.full() <span class="co"># all the enzymes in the RestrictionBatch</span></span>
<span id="cb37-10"><a href="#cb37-10"></a>{<span class="st">'KpnI'</span>: [], <span class="st">'EcoRV'</span>: [], <span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span>
<span id="cb37-11"><a href="#cb37-11"></a><span class="op">>>></span> Ana.with_sites() <span class="co"># output only the result for enzymes which have a site</span></span>
<span id="cb37-12"><a href="#cb37-12"></a>{<span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span>
<span id="cb37-13"><a href="#cb37-13"></a><span class="op">>>></span> Ana.without_site() <span class="co"># output only the enzymes which have no site</span></span>
<span id="cb37-14"><a href="#cb37-14"></a>{<span class="st">'KpnI'</span>: [], <span class="st">'EcoRV'</span>: []}</span>
<span id="cb37-15"><a href="#cb37-15"></a><span class="op">>>></span> Ana.only_between(<span class="dv">1</span>, <span class="dv">20</span>) <span class="co"># the enzymes which cut between position 1 and 20</span></span>
<span id="cb37-16"><a href="#cb37-16"></a>{}</span>
<span id="cb37-17"><a href="#cb37-17"></a><span class="op">>>></span> Ana.only_between(<span class="dv">20</span>, <span class="dv">34</span>) <span class="co"># etc...</span></span>
<span id="cb37-18"><a href="#cb37-18"></a>{<span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span>
<span id="cb37-19"><a href="#cb37-19"></a><span class="op">>>></span> Ana.only_outside(<span class="dv">20</span>, <span class="dv">34</span>)</span>
<span id="cb37-20"><a href="#cb37-20"></a>{}</span>
<span id="cb37-21"><a href="#cb37-21"></a><span class="op">>>></span> Ana.with_name([EcoRI])</span>
<span id="cb37-22"><a href="#cb37-22"></a>{<span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span>
<span id="cb37-23"><a href="#cb37-23"></a><span class="op">>>></span></span></code></pre></div>
<p>To get a nice output, you still use <code>print_that</code> but this time with the command you want executed as argument.</p>
<div class="sourceCode" id="cb38"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb38-1"><a href="#cb38-1"></a><span class="op">>>></span> Ana.print_that(Ana.blunt())</span>
<span id="cb38-2"><a href="#cb38-2"></a></span>
<span id="cb38-3"><a href="#cb38-3"></a> Enzymes which do <span class="kw">not</span> cut the sequence.</span>
<span id="cb38-4"><a href="#cb38-4"></a></span>
<span id="cb38-5"><a href="#cb38-5"></a>EcoRV</span>
<span id="cb38-6"><a href="#cb38-6"></a></span>
<span id="cb38-7"><a href="#cb38-7"></a><span class="op">>>></span> pt <span class="op">=</span> Ana.print_that</span>
<span id="cb38-8"><a href="#cb38-8"></a><span class="op">>>></span> pt(Ana.with_sites())</span>
<span id="cb38-9"><a href="#cb38-9"></a></span>
<span id="cb38-10"><a href="#cb38-10"></a>EcoRI : <span class="fl">33.</span></span>
<span id="cb38-11"><a href="#cb38-11"></a></span>
<span id="cb38-12"><a href="#cb38-12"></a><span class="op">>>></span> pt(Ana.without_site())</span>
<span id="cb38-13"><a href="#cb38-13"></a></span>
<span id="cb38-14"><a href="#cb38-14"></a> Enzymes which do <span class="kw">not</span> cut the sequence.</span>
<span id="cb38-15"><a href="#cb38-15"></a></span>
<span id="cb38-16"><a href="#cb38-16"></a>EcoRV KpnI</span>
<span id="cb38-17"><a href="#cb38-17"></a></span>
<span id="cb38-18"><a href="#cb38-18"></a><span class="op">>>></span> <span class="co"># etc ...</span></span></code></pre></div>
<h4 id="more-complex-analysis"><a name="4.6"></a>4.6 More complex analysis</h4>
<p>All of these methods (except <code>full()</code> which, well … do a full restriction analysis) can be supplied with an additional dictionary. If no dictionary is supplied a full restriction analysis is used as starting point. Otherwise the dictionary provided by the argument <code>dct</code> is used. The dictionary must be formatted as the result of <code>RestrictionBatch.search</code>. Therefore of the form <code>{'enzyme_name': [position1, position2],...}</code>, where <em>position1</em> and <em>position2</em> are integers. All methods list previously output such dictionaries and can be used as starting point.</p>
<p>Using this method you can build really complex query by chaining several method one after the other. For example if you want all the enzymes which are 5’ overhang and cut the sequence only once, you have two ways to go:</p>
<p>The hard way consist to build a restriction batch containing only 5’ overhang enzymes and use this batch to create a new <code>Analysis</code> instance and then use the method <code>with_N_sites()</code> as follow:</p>
<div class="sourceCode" id="cb39"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb39-1"><a href="#cb39-1"></a><span class="op">>>></span> rbov5 <span class="op">=</span> RestrictionBatch([x <span class="cf">for</span> x <span class="kw">in</span> rb <span class="cf">if</span> x.is_5overhang()])</span>
<span id="cb39-2"><a href="#cb39-2"></a><span class="op">>>></span> Anaov5 <span class="op">=</span> Analysis(rbov5, new_seq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb39-3"><a href="#cb39-3"></a><span class="op">>>></span> Anaov5.with_N_sites(<span class="dv">1</span>)</span>
<span id="cb39-4"><a href="#cb39-4"></a>{<span class="st">'EcoRI'</span> : [<span class="dv">33</span>]}</span></code></pre></div>
<p>The easy solution is to chain several <code>Analysis</code> methods. This is possible since each method return a dictionary as results and is able to take a dictionary as input:</p>
<div class="sourceCode" id="cb40"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb40-1"><a href="#cb40-1"></a><span class="op">>>></span> Ana.with_N_sites(<span class="dv">1</span>, Ana.overhang5())</span>
<span id="cb40-2"><a href="#cb40-2"></a>{<span class="st">'EcoRI'</span>: [<span class="dv">33</span>]}</span></code></pre></div>
<p>The dictionary is always the last argument whatever the command you use.</p>
<p>The way to prefer certainly depends of the conditions you will use your <code>Analysis</code> instance. If you are likely to frequently reuse the same batch with different sequences, using a dedicated <code>RestrictionBatch</code> might be faster as the batch is likely to be smaller. Chaining methods is generally quicker when working with an interactive shell. In a script, the extended syntax may be easier to understand in a few months.</p>
<h3 id="advanced-features-the-formattedseq-class"><a name="5"></a>5. Advanced features: the FormattedSeq class</h3>
<p>Restriction enzymes require a much more strict formatting of the DNA sequences than <code>Bio.Seq</code> object provides. For example, the restriction enzymes expect to find an ungapped (no space) upper-case sequence, while <code>Bio.Seq</code> object allow sequences to be in lower-case separated by spaces. Therefore when a restriction enzyme analyse a <code>Bio.Seq</code> object (be it a <code>Seq</code> or a <code>MutableSeq</code>), the object undergoes a conversion. The class <code>FormattedSeq</code> ensure the smooth conversion from a <code>Bio.Seq</code> object to something which can be safely be used by the enzyme.</p>
<p>While this conversion is done automatically by the enzymes if you provide them with a <code>Seq</code> or a <code>MutableSeq</code>, there is time where it will be more efficient to realise the conversion before hand. Each time a <code>Seq</code> object is passed to an enzyme for analysis you pay a overhead due to the conversion. When analysing the same sequence over and over, it will be faster to convert the sequence, store the conversion and then use only the converted sequence.</p>
<h4 id="creating-a-formattedseq"><a name="5.1"></a>5.1 Creating a FormattedSeq</h4>
<p>Creating a <code>FormattedSeq</code> from a <code>Bio.Seq</code> object is simple. The first argument of <code>FormattedSeq</code> is the sequence you wish to convert. You can specify a shape with the second argument <code>linear</code>, if you don’t the <code>FormattedSeq</code> will be linear:</p>
<div class="sourceCode" id="cb41"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb41-1"><a href="#cb41-1"></a><span class="op">>>></span> <span class="im">from</span> Bio.Restriction <span class="im">import</span> <span class="op">*</span></span>
<span id="cb41-2"><a href="#cb41-2"></a><span class="op">>>></span> <span class="im">from</span> Bio.Seq <span class="im">import</span> Seq</span>
<span id="cb41-3"><a href="#cb41-3"></a><span class="op">>>></span> seq <span class="op">=</span> Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>)</span>
<span id="cb41-4"><a href="#cb41-4"></a><span class="op">>>></span> linear_fseq <span class="op">=</span> FormattedSeq(seq, linear<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb41-5"><a href="#cb41-5"></a><span class="op">>>></span> default_fseq <span class="op">=</span> FormattedSeq(seq)</span>
<span id="cb41-6"><a href="#cb41-6"></a><span class="op">>>></span> circular_fseq <span class="op">=</span> FormattedSeq(seq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb41-7"><a href="#cb41-7"></a><span class="op">>>></span> linear_fseq</span>
<span id="cb41-8"><a href="#cb41-8"></a>FormattedSeq(Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>), linear<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb41-9"><a href="#cb41-9"></a><span class="op">>>></span> linear_fseq.is_linear()</span>
<span id="cb41-10"><a href="#cb41-10"></a><span class="va">True</span></span>
<span id="cb41-11"><a href="#cb41-11"></a><span class="op">>>></span> default_fseq.is_linear()</span>
<span id="cb41-12"><a href="#cb41-12"></a><span class="va">True</span></span>
<span id="cb41-13"><a href="#cb41-13"></a><span class="op">>>></span> circular_fseq.is_linear()</span>
<span id="cb41-14"><a href="#cb41-14"></a><span class="va">False</span></span>
<span id="cb41-15"><a href="#cb41-15"></a><span class="op">>>></span> circular_fseq</span>
<span id="cb41-16"><a href="#cb41-16"></a>FormattedSeq(Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>), linear<span class="op">=</span><span class="va">False</span>)</span></code></pre></div>
<h4 id="unlike-bio.seq-formattedseq-retains-information-about-their-shape"><a name="5.2"></a>5.2 Unlike Bio.Seq, FormattedSeq retains information about their shape</h4>
<p><code>FormattedSeq</code> retains information about the shape of the sequence. Therefore unlike with <code>Seq</code> and <code>MutableSeq</code> you don’t need to specify the shape of the sequence when using <code>search()</code> or <code>catalyse()</code>:</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb42-1"><a href="#cb42-1"></a><span class="op">>>></span> EcoRI.search(linear_fseq)</span>
<span id="cb42-2"><a href="#cb42-2"></a>[<span class="dv">15</span>]</span>
<span id="cb42-3"><a href="#cb42-3"></a><span class="op">>>></span> EcoRI.search(circular_fseq) <span class="co"># no need to specify the shape</span></span>
<span id="cb42-4"><a href="#cb42-4"></a>[<span class="dv">15</span>, <span class="dv">25</span>]</span></code></pre></div>
<p>In fact, the shape of a FormattedSeq is not altered by the second argument of the commands <code>search()</code> and <code>catalyse()</code>:</p>
<div class="sourceCode" id="cb43"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb43-1"><a href="#cb43-1"></a><span class="op">>>></span> <span class="co"># In fact the shape is blocked.</span></span>
<span id="cb43-2"><a href="#cb43-2"></a><span class="op">>>></span> <span class="co"># The 3 following commands give the same results</span></span>
<span id="cb43-3"><a href="#cb43-3"></a><span class="op">>>></span> <span class="co"># which correspond to a circular sequence</span></span>
<span id="cb43-4"><a href="#cb43-4"></a><span class="op">>>></span> EcoRI.search(circular_fseq)</span>
<span id="cb43-5"><a href="#cb43-5"></a>[<span class="dv">15</span>, <span class="dv">25</span>]</span>
<span id="cb43-6"><a href="#cb43-6"></a><span class="op">>>></span> EcoRI.search(circular_fseq, linear<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb43-7"><a href="#cb43-7"></a>[<span class="dv">15</span>, <span class="dv">25</span>]</span>
<span id="cb43-8"><a href="#cb43-8"></a><span class="op">>>></span> EcoRI.search(circular_fseq, linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb43-9"><a href="#cb43-9"></a>[<span class="dv">15</span>, <span class="dv">25</span>]</span></code></pre></div>
<h4 id="changing-the-shape-of-a-formattedseq"><a name="5.3"></a>5.3 Changing the shape of a FormattedSeq</h4>
<p>You can however change the shape of the <code>FormattedSeq</code>. The command to use are:</p>
<div class="sourceCode" id="cb44"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb44-1"><a href="#cb44-1"></a>FormattedSeq.to_circular() <span class="op">=></span> new FormattedSeq, shape will be circular.</span>
<span id="cb44-2"><a href="#cb44-2"></a>FormattedSeq.to_linear() <span class="op">=></span> new FormattedSeq, shape will be linear</span>
<span id="cb44-3"><a href="#cb44-3"></a>FormattedSeq.circularise() <span class="op">=></span> change the shape of FormattedShape to circular</span>
<span id="cb44-4"><a href="#cb44-4"></a>FormattedSeq.linearise() <span class="op">=></span> change the shape of FormattedShape to linear</span></code></pre></div>
<div class="sourceCode" id="cb45"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb45-1"><a href="#cb45-1"></a><span class="op">>>></span> circular_fseq</span>
<span id="cb45-2"><a href="#cb45-2"></a>FormatedSeq(Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>), linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb45-3"><a href="#cb45-3"></a><span class="op">>>></span> circular_fseq.is_linear()</span>
<span id="cb45-4"><a href="#cb45-4"></a><span class="va">False</span></span>
<span id="cb45-5"><a href="#cb45-5"></a><span class="op">>>></span> circular_fseq <span class="op">==</span> linear_fseq</span>
<span id="cb45-6"><a href="#cb45-6"></a><span class="va">False</span></span>
<span id="cb45-7"><a href="#cb45-7"></a><span class="op">>>></span> newseq <span class="op">=</span> circular_fseq.to_linear()</span>
<span id="cb45-8"><a href="#cb45-8"></a><span class="op">>>></span> circular_fseq</span>
<span id="cb45-9"><a href="#cb45-9"></a>FormatedSeq(Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>), linear<span class="op">=</span><span class="va">False</span>)</span>
<span id="cb45-10"><a href="#cb45-10"></a><span class="op">>>></span> newseq</span>
<span id="cb45-11"><a href="#cb45-11"></a>FormatedSeq(Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>), linear<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb45-12"><a href="#cb45-12"></a><span class="op">>>></span> circular_fseq.linearise()</span>
<span id="cb45-13"><a href="#cb45-13"></a><span class="op">>>></span> circular_fseq</span>
<span id="cb45-14"><a href="#cb45-14"></a>FormatedSeq(Seq(<span class="st">'TTCAAAAAAAAAAGAATTCAAAAGAA'</span>), linear<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb45-15"><a href="#cb45-15"></a><span class="op">>>></span> circular_fseq.is_linear()</span>
<span id="cb45-16"><a href="#cb45-16"></a><span class="va">True</span></span>
<span id="cb45-17"><a href="#cb45-17"></a><span class="op">>>></span> circular_fseq <span class="op">==</span> linear_fseq</span>
<span id="cb45-18"><a href="#cb45-18"></a><span class="va">True</span></span>
<span id="cb45-19"><a href="#cb45-19"></a><span class="op">>>></span> EcoRI.search(circular_fseq) <span class="co"># which is now linear</span></span>
<span id="cb45-20"><a href="#cb45-20"></a>[<span class="dv">15</span>]</span></code></pre></div>
<h4 id="using-and-operators-with-formattedseq"><a name="5.4"></a>5.4 Using / and // operators with FormattedSeq</h4>
<p>Not having to specify the shape of the sequence to analyse gives you the opportunity to use the shorthand ‘/’ and ‘//’ with restriction enzymes:</p>
<div class="sourceCode" id="cb46"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb46-1"><a href="#cb46-1"></a><span class="op">>>></span> EcoRI<span class="op">/</span>linear_fseq <span class="co"># <=> EcoRI.search(linear_fseq)</span></span>
<span id="cb46-2"><a href="#cb46-2"></a>[<span class="dv">15</span>]</span>
<span id="cb46-3"><a href="#cb46-3"></a><span class="op">>>></span> linear_fseq<span class="op">/</span>EcoRI <span class="co"># <=> EcoRI.search(linear_fseq)</span></span>
<span id="cb46-4"><a href="#cb46-4"></a>[<span class="dv">15</span>]</span>
<span id="cb46-5"><a href="#cb46-5"></a><span class="op">>>></span> EcoRI<span class="op">//</span>linear_fseq <span class="co"># <=> linear_fseq//EcoRI <=> EcoRI.catalyse(linear_fseq)</span></span>
<span id="cb46-6"><a href="#cb46-6"></a>(Seq(<span class="st">'TTCAAAAAAAAAAG'</span>), Seq(<span class="st">'AATTCAAAAGAA'</span>))</span></code></pre></div>
<p>Another way to avoid the overhead due to a repetitive conversion from a <code>Seq</code> object to a <code>FormattedSeq</code> is to use a <a href="#2"><code>RestrictionBatch</code></a>.</p>
<p>To conclude, the performance gain achieved when using a <code>FormattedSeq</code> instead of a <code>Seq</code> is not huge. The analysis of a 10 kb sequence by all the enzymes in <code>AllEnzymes</code> (<code>for x in AllEnzymes: x.search(seq)</code>, 867 enzymes) is 7 % faster when using a <code>FormattedSeq</code> than a <code>Seq</code>. Using a <code>RestrictionBatch</code> (<code>AllEnzymes.search(seq)</code>) is about as fast as using a <code>FormattedSeq</code> the first time the search is run. This however is dramatically reduced in subsequent runs with the same sequence (<code>RestrictionBatch</code> keeps in memory the result of their last run while the sequence is not changed).</p>
<h3 id="more-advanced-features"><a name="6"></a>6. More advanced features</h3>
<p>This chapter addresses some more advanced features of the packages, most users can safely ignore it.</p>
<h4 id="updating-the-enzymes-from-rebase"><a name="6.1"></a>6.1 Updating the enzymes from REBASE</h4>
<p>Most people will certainly not need to update the enzymes. The restriction enzyme package will be updated in with each new release of Biopython. But if you wish to get an update in between Biopython-releases here is how to do it.</p>
<p>First, you have to download the two scripts <code>rebase_update.py</code> and <code>ranacompile.py</code>: Go to https://github.com/biopython/biopython/tree/master/Scripts/Restriction, click on the respective file and press the ‘<strong>Raw</strong>’ button on the top right of the code window. Then, with right-click, save the file. Both scripts must be in the same directory.</p>
<h5 id="fetching-the-recent-enzyme-files-manually-from-rebase"><a name="6.1.1"></a>6.1.1 Fetching the recent enzyme files manually from REBASE</h5>
<p>Each month, <a href="http://rebase.neb.com/rebase/rebase.html">REBASE</a> release a new compilation of data about restriction enzymes. While the enzymes do not change so frequently, you may wish to update the restriction enzymes classes. The first thing to do is to get the last rebase file. You can find the release of REBASE at http://rebase.neb.com/rebase/rebase.files.html. The file you are interested in are in the EMBOSS format. You can download the files directly from the REBASE ftp server using your browser. The file are situated at ftp://ftp.neb.com/pub/rebase. You will have to download 3 files: <code>emboss_e.###</code>, <code>emboss_r.###</code>, and <code>emboss_s.###</code>. The <code>###</code> is a 3 digit number corresponding to the year and month of the release. The first digit is the year, the two last are the month: so July 2015 will be: 507; October 2016: 610, etc… Download the three file corresponding to the current month and place them in the same folder as your <code>rebase_update.py</code> and <code>ranacompiler.py</code> scripts.</p>
<h5 id="fetching-the-recent-enzyme-files-with-rebase_update.py"><a name="6.1.2"></a>6.1.2 Fetching the recent enzyme files with rebase_update.py</h5>
<p>Another way to do the same thing is to use the <code>rebase_update.py</code> script. It will connect directly to the rebase ftp server and download the last batch of emboss files. From a DOS or Unix shell do the following:</p>
<div class="sourceCode" id="cb47"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb47-1"><a href="#cb47-1"></a>$ <span class="bu">cd</span> path_to_the_update_script</span>
<span id="cb47-2"><a href="#cb47-2"></a>$ <span class="ex">rebase_update.py</span> -p http://www.somewhere.com:8000</span>
<span id="cb47-3"><a href="#cb47-3"></a></span>
<span id="cb47-4"><a href="#cb47-4"></a><span class="ex">Please</span> wait, trying to connect to Rebase</span>
<span id="cb47-5"><a href="#cb47-5"></a></span>
<span id="cb47-6"><a href="#cb47-6"></a><span class="ex">copying</span> ftp://ftp.neb.com/pub/rebase/emboss_e.407</span>
<span id="cb47-7"><a href="#cb47-7"></a><span class="ex">to</span> /cvsroot/bioPython/Bio/Restriction/Scripts/emboss_e.407</span>
<span id="cb47-8"><a href="#cb47-8"></a><span class="ex">copying</span> ftp://ftp.neb.com/pub/rebase/emboss_s.407</span>
<span id="cb47-9"><a href="#cb47-9"></a><span class="ex">to</span> /cvsroot/bioPython/Bio/Restriction/Scripts/emboss_s.407</span>
<span id="cb47-10"><a href="#cb47-10"></a><span class="ex">copying</span> ftp://ftp.neb.com/pub/rebase/emboss_r.407</span>
<span id="cb47-11"><a href="#cb47-11"></a><span class="ex">to</span> /cvsroot/bioPython/Bio/Restriction/Scripts/emboss_r.407</span></code></pre></div>
<p>Some explanation are needed: <code>-p</code> is the switch to indicate to the script you are using a proxy. If you use a ftp proxy enter its address and the connection port after the ‘<code>:</code>’.</p>
<h5 id="compiling-a-new-dictionary-with-ranacompiler.py"><a name="6.1.3"></a>6.1.3 Compiling a new dictionary with ranacompiler.py</h5>
<p>Once you have got the recent emboss files you can compile a new module containing the data necessary to create restriction enzyme.</p>
<p>Note: if the emboss files are not present in the current directory or if they are not up to date, <code>ranacompiler.py</code> will invoke the script <a href="#6.1.2"><code>rebase_update.py</code></a>, which needs to be installed in the same folder. You will need to use the same options as before (ie <code>-m</code> and <code>-p</code>). See the previous paragraph on <a href="#6.1.2"><code>rebase_update.py</code></a> for more details.</p>
<p>For simplicity let’s assume we have put the emboss files in the same folder as the files which contains the script <code>ranacompiler.py</code>. You may have the change the mode of the file to make it executable:</p>
<div class="sourceCode" id="cb48"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb48-1"><a href="#cb48-1"></a>$ <span class="bu">cd</span> path_to_the_ranacompiler_script</span>
<span id="cb48-2"><a href="#cb48-2"></a>$ <span class="fu">chmod</span> <span class="st">'+x'</span> ranacompiler.py</span></code></pre></div>
<p>Now execute the script:</p>
<div class="sourceCode" id="cb49"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb49-1"><a href="#cb49-1"></a>$ <span class="ex">Python</span> ranacompiler.py <span class="co"># or ./ranacompiler.py</span></span></code></pre></div>
<p>You get normally the following message:</p>
<div class="sourceCode" id="cb50"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb50-1"><a href="#cb50-1"></a>$ <span class="ex">./ranacompiler.py</span></span>
<span id="cb50-2"><a href="#cb50-2"></a></span>
<span id="cb50-3"><a href="#cb50-3"></a> <span class="ex">Using</span> the files : emboss_e.407, emboss_r.407, emboss_s.407</span>
<span id="cb50-4"><a href="#cb50-4"></a></span>
<span id="cb50-5"><a href="#cb50-5"></a><span class="ex">WARNING</span> : HaeIV cut twice with different overhang length each time.</span>
<span id="cb50-6"><a href="#cb50-6"></a> <span class="ex">Unable</span> to deal with this behaviour.</span>
<span id="cb50-7"><a href="#cb50-7"></a> <span class="ex">This</span> enzyme will not be included in the database. Sorry.</span>
<span id="cb50-8"><a href="#cb50-8"></a> <span class="ex">Checking</span> :</span>
<span id="cb50-9"><a href="#cb50-9"></a><span class="ex">Anyway</span>, HaeIV is not commercially available.</span>
<span id="cb50-10"><a href="#cb50-10"></a></span>
<span id="cb50-11"><a href="#cb50-11"></a><span class="ex">WARNING</span> : HpyUM037X has two different sites.</span>
<span id="cb50-12"><a href="#cb50-12"></a></span>
<span id="cb50-13"><a href="#cb50-13"></a></span>
<span id="cb50-14"><a href="#cb50-14"></a><span class="ex">The</span> new database contains 867 enzymes.</span>
<span id="cb50-15"><a href="#cb50-15"></a></span>
<span id="cb50-16"><a href="#cb50-16"></a><span class="ex">Writing</span> the dictionary containing the new Restriction classes...</span>
<span id="cb50-17"><a href="#cb50-17"></a><span class="ex">OK.</span></span>
<span id="cb50-18"><a href="#cb50-18"></a></span>
<span id="cb50-19"><a href="#cb50-19"></a><span class="ex">Writing</span> the dictionary containing the suppliers datas...</span>
<span id="cb50-20"><a href="#cb50-20"></a><span class="ex">OK.</span></span>
<span id="cb50-21"><a href="#cb50-21"></a></span>
<span id="cb50-22"><a href="#cb50-22"></a><span class="ex">Writing</span> the dictionary containing the Restriction types....</span>
<span id="cb50-23"><a href="#cb50-23"></a><span class="ex">OK.</span></span>
<span id="cb50-24"><a href="#cb50-24"></a></span>
<span id="cb50-25"><a href="#cb50-25"></a> <span class="ex">******************************************************************************</span></span>
<span id="cb50-26"><a href="#cb50-26"></a></span>
<span id="cb50-27"><a href="#cb50-27"></a> <span class="ex">Compilation</span> of the new dictionary : OK.</span>
<span id="cb50-28"><a href="#cb50-28"></a> <span class="ex">Installation</span> : No.</span>
<span id="cb50-29"><a href="#cb50-29"></a></span>
<span id="cb50-30"><a href="#cb50-30"></a> <span class="ex">You</span> will find the newly created <span class="st">'Restriction_Dictionary.py'</span> file</span>
<span id="cb50-31"><a href="#cb50-31"></a> <span class="kw">in</span> <span class="ex">the</span> :</span>
<span id="cb50-32"><a href="#cb50-32"></a></span>
<span id="cb50-33"><a href="#cb50-33"></a> <span class="ex">/path/where/you/run/ranacompiler.py</span></span>
<span id="cb50-34"><a href="#cb50-34"></a></span>
<span id="cb50-35"><a href="#cb50-35"></a> <span class="ex">Make</span> a copy of <span class="st">'Restriction_Dictionary.py'</span> and place it with</span>
<span id="cb50-36"><a href="#cb50-36"></a> <span class="ex">the</span> other Restriction libraries.</span>
<span id="cb50-37"><a href="#cb50-37"></a></span>
<span id="cb50-38"><a href="#cb50-38"></a> <span class="ex">note</span> :</span>
<span id="cb50-39"><a href="#cb50-39"></a> <span class="ex">This</span> folder should be :</span>
<span id="cb50-40"><a href="#cb50-40"></a></span>
<span id="cb50-41"><a href="#cb50-41"></a> <span class="ex">path_to_python/site-packages/Bio/Restriction</span></span>
<span id="cb50-42"><a href="#cb50-42"></a></span>
<span id="cb50-43"><a href="#cb50-43"></a> <span class="ex">******************************************************************************</span></span></code></pre></div>
<p>The first line indicate which emboss files have been used for the present compilation. You can safely ignore the warnings as long as the <code>compilation of the new dictionary : OK.</code> is present in the last part of the output. They are here for debugging purpose. The number of enzymes in the new module is indicated as well as a list of the dictionary which have been compiled. The last part indicate that the module has been succesfully created but not installed. To finish the update you must copy the file <code>Restriction_Dictionary.py</code> into the folder <code>/your_python_path/site-packages/Bio/Restriction/</code> as indicated by the script. Looking into the present folder, you will see to new files: the newly created dictionary <code>Restriction_Dictionary.py</code> and <code>Restriction_Dictionary.old</code>. This last file containing the old dictionary to which you can revert in case anything the new file is corrupted (this should not happen since the script is happy enough the new dictionary is good, but if there is a problem it is always nice to know you can revert to the previous setting without having to reinstall the whole thing.</p>
<p>If you whish, the script may install the folder for you as well, but you will have to run it as root if your normal user has no write access to your Python installation (and it should’nt). Use the command <code>ranacompiler.py -i</code> or <code>ranacompiler.py --install</code> for this.</p>
<p>If anything goes wrong (you have no write access to the destination folder for example) the script will let you know it did not perform the installation. It will however still save the new module in the current directory.</p>
<p>As you can see the script is not very bright and will redo the compilation each time it is invoked, no matter if a previous version of the module is already present.</p>
<h4 id="subclassing-the-class-analysis"><a name="6.2"></a>6.2 Subclassing the class Analysis</h4>
<p>As seen previously, you can modify some aspects of the <code>Analysis</code> output interactively. However if you want to write your own <code>Analysis</code> class, you may wish to provide others output facilities than is given in this package. Depending on what you want to do you may get away with simply changing the <code>make_format</code> method of your derived class or you will need to provide new methods. Rather than get into a long explanation, here is the implementation of a rather useless <code>Analysis</code> class:</p>
<div class="sourceCode" id="cb51"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb51-1"><a href="#cb51-1"></a><span class="op">>>></span> <span class="kw">class</span> UselessAnalysis(Analysis):</span>
<span id="cb51-2"><a href="#cb51-2"></a></span>
<span id="cb51-3"><a href="#cb51-3"></a> <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, rb<span class="op">=</span>RestrictionBatch(), seq<span class="op">=</span>Seq(<span class="st">''</span>), lin<span class="op">=</span><span class="va">True</span>):</span>
<span id="cb51-4"><a href="#cb51-4"></a> <span class="co">"""UselessAnalysis -> A class that waste your time"""</span></span>
<span id="cb51-5"><a href="#cb51-5"></a> <span class="co">#</span></span>
<span id="cb51-6"><a href="#cb51-6"></a> <span class="co"># Unless you want to do something more fancy all</span></span>
<span id="cb51-7"><a href="#cb51-7"></a> <span class="co"># you need to do here is instantiate Analysis.</span></span>
<span id="cb51-8"><a href="#cb51-8"></a> <span class="co"># Don't forget the self in __init__</span></span>
<span id="cb51-9"><a href="#cb51-9"></a> <span class="co">#</span></span>
<span id="cb51-10"><a href="#cb51-10"></a> Analysis.<span class="fu">__init__</span>(<span class="va">self</span>, rb, seq, lin)</span>
<span id="cb51-11"><a href="#cb51-11"></a></span>
<span id="cb51-12"><a href="#cb51-12"></a> <span class="kw">def</span> make_format(<span class="va">self</span>, cut<span class="op">=</span>[], t<span class="op">=</span><span class="st">''</span>, nc<span class="op">=</span>[], s<span class="op">=</span><span class="st">''</span>):</span>
<span id="cb51-13"><a href="#cb51-13"></a> <span class="co">"""not funny"""</span></span>
<span id="cb51-14"><a href="#cb51-14"></a> <span class="co">#</span></span>
<span id="cb51-15"><a href="#cb51-15"></a> <span class="co"># Generally, you don't need to do anything else here</span></span>
<span id="cb51-16"><a href="#cb51-16"></a> <span class="co"># This will tell to your new class to default to the</span></span>
<span id="cb51-17"><a href="#cb51-17"></a> <span class="co"># _make_joke format.</span></span>
<span id="cb51-18"><a href="#cb51-18"></a> <span class="co">#</span></span>
<span id="cb51-19"><a href="#cb51-19"></a> <span class="cf">return</span> <span class="va">self</span>._make_joke(cut, t, nc, s)</span>
<span id="cb51-20"><a href="#cb51-20"></a></span>
<span id="cb51-21"><a href="#cb51-21"></a> <span class="kw">def</span> print_as(<span class="va">self</span>, what<span class="op">=</span><span class="st">'joke'</span>):</span>
<span id="cb51-22"><a href="#cb51-22"></a> <span class="co">"""Never know somebody might want to change the behaviour of</span></span>
<span id="cb51-23"><a href="#cb51-23"></a><span class="co"> this class."""</span></span>
<span id="cb51-24"><a href="#cb51-24"></a> <span class="co">#</span></span>
<span id="cb51-25"><a href="#cb51-25"></a> <span class="co"># add your new option to print_as</span></span>
<span id="cb51-26"><a href="#cb51-26"></a> <span class="co">#</span></span>
<span id="cb51-27"><a href="#cb51-27"></a> <span class="cf">if</span> what <span class="op">==</span> <span class="st">'joke'</span>:</span>
<span id="cb51-28"><a href="#cb51-28"></a> <span class="va">self</span>.make_format <span class="op">=</span> <span class="va">self</span>._make_joke</span>
<span id="cb51-29"><a href="#cb51-29"></a> <span class="cf">return</span></span>
<span id="cb51-30"><a href="#cb51-30"></a> <span class="cf">else</span>:</span>
<span id="cb51-31"><a href="#cb51-31"></a> <span class="co">#</span></span>
<span id="cb51-32"><a href="#cb51-32"></a> <span class="co"># The other options will be treated as before</span></span>
<span id="cb51-33"><a href="#cb51-33"></a> <span class="co">#</span></span>
<span id="cb51-34"><a href="#cb51-34"></a> <span class="cf">return</span> Analysis.print_as(<span class="va">self</span>, what)</span>
<span id="cb51-35"><a href="#cb51-35"></a></span>
<span id="cb51-36"><a href="#cb51-36"></a> <span class="kw">def</span> _make_joke(<span class="va">self</span>, cut<span class="op">=</span>[], title<span class="op">=</span><span class="st">''</span>, nc<span class="op">=</span>[], s1<span class="op">=</span><span class="st">''</span>):</span>
<span id="cb51-37"><a href="#cb51-37"></a> <span class="co">"""UA._make_joke(cut, t, nc, s) -> new analysis output"""</span></span>
<span id="cb51-38"><a href="#cb51-38"></a> <span class="co">#</span></span>
<span id="cb51-39"><a href="#cb51-39"></a> <span class="co"># starting your new method with '_make_'</span></span>
<span id="cb51-40"><a href="#cb51-40"></a> <span class="co"># will give a hint to what it is suppose to do</span></span>
<span id="cb51-41"><a href="#cb51-41"></a> <span class="co">#</span></span>
<span id="cb51-42"><a href="#cb51-42"></a> <span class="co"># We will not process the non-cutting enzymes</span></span>
<span id="cb51-43"><a href="#cb51-43"></a> <span class="co"># Their names are in nc</span></span>
<span id="cb51-44"><a href="#cb51-44"></a> <span class="co"># s1 is the string printed before them</span></span>
<span id="cb51-45"><a href="#cb51-45"></a> <span class="co">#</span></span>
<span id="cb51-46"><a href="#cb51-46"></a> <span class="cf">if</span> <span class="kw">not</span> title:</span>
<span id="cb51-47"><a href="#cb51-47"></a> title <span class="op">=</span> <span class="st">'</span><span class="ch">\n</span><span class="st">You have guessed right the following enzymes:</span><span class="ch">\n\n</span><span class="st">'</span></span>
<span id="cb51-48"><a href="#cb51-48"></a> <span class="cf">for</span> name, sites <span class="kw">in</span> cut:</span>
<span id="cb51-49"><a href="#cb51-49"></a> <span class="co">#</span></span>
<span id="cb51-50"><a href="#cb51-50"></a> <span class="co"># cut contains:</span></span>
<span id="cb51-51"><a href="#cb51-51"></a> <span class="co"># - the name of the enzymes which cut the sequence (name)</span></span>
<span id="cb51-52"><a href="#cb51-52"></a> <span class="co"># - a list of the site positions (sites)</span></span>
<span id="cb51-53"><a href="#cb51-53"></a> <span class="co">#</span></span>
<span id="cb51-54"><a href="#cb51-54"></a> guess <span class="op">=</span> <span class="bu">raw_input</span>(<span class="st">"next enzyme is </span><span class="sc">%s</span><span class="st">, Guess how many sites ?</span><span class="ch">\n</span><span class="st">>>> "</span><span class="op">%</span>name)</span>
<span id="cb51-55"><a href="#cb51-55"></a> <span class="cf">try</span>:</span>
<span id="cb51-56"><a href="#cb51-56"></a> guess <span class="op">=</span> <span class="bu">int</span>(guess)</span>
<span id="cb51-57"><a href="#cb51-57"></a> <span class="cf">except</span>:</span>
<span id="cb51-58"><a href="#cb51-58"></a> guess <span class="op">=</span> <span class="va">None</span></span>
<span id="cb51-59"><a href="#cb51-59"></a> <span class="cf">if</span> guess <span class="op">==</span> <span class="bu">len</span>(sites):</span>
<span id="cb51-60"><a href="#cb51-60"></a> <span class="bu">print</span> <span class="st">'You did guess right. Good. Next.'</span></span>
<span id="cb51-61"><a href="#cb51-61"></a> result <span class="op">=</span> <span class="st">'</span><span class="sc">%i</span><span class="st"> site'</span> <span class="op">%</span> guess</span>
<span id="cb51-62"><a href="#cb51-62"></a> <span class="cf">if</span> guess <span class="op">></span> <span class="dv">1</span>:</span>
<span id="cb51-63"><a href="#cb51-63"></a> result <span class="op">+=</span> <span class="st">'s'</span></span>
<span id="cb51-64"><a href="#cb51-64"></a></span>
<span id="cb51-65"><a href="#cb51-65"></a> <span class="co">#</span></span>
<span id="cb51-66"><a href="#cb51-66"></a> <span class="co"># now we format the line. See the PrintFormat module</span></span>
<span id="cb51-67"><a href="#cb51-67"></a> <span class="co"># for some examples</span></span>
<span id="cb51-68"><a href="#cb51-68"></a> <span class="co"># PrintFormat.__section_list and _make_map are good start.</span></span>
<span id="cb51-69"><a href="#cb51-69"></a> <span class="co">#</span></span>
<span id="cb51-70"><a href="#cb51-70"></a> title<span class="op">=</span><span class="st">''</span>.join((title, <span class="bu">str</span>(name).ljust(<span class="va">self</span>.NameWidth),</span>
<span id="cb51-71"><a href="#cb51-71"></a> <span class="st">' : '</span>, result, <span class="st">'.</span><span class="ch">\n</span><span class="st">'</span>))</span>
<span id="cb51-72"><a href="#cb51-72"></a> <span class="bu">print</span> <span class="st">'</span><span class="ch">\n</span><span class="st">No more enzyme.'</span></span>
<span id="cb51-73"><a href="#cb51-73"></a> <span class="cf">return</span> title</span>
<span id="cb51-74"><a href="#cb51-74"></a> <span class="co">#</span></span>
<span id="cb51-75"><a href="#cb51-75"></a> <span class="co"># I you want to print the non cutting enzymes use</span></span>
<span id="cb51-76"><a href="#cb51-76"></a> <span class="co"># the following return instead of the previous one:</span></span>
<span id="cb51-77"><a href="#cb51-77"></a> <span class="co">#</span></span>
<span id="cb51-78"><a href="#cb51-78"></a> <span class="co">#return title + t + self._make_nocut_only(nc,s1)</span></span>
<span id="cb51-79"><a href="#cb51-79"></a></span>
<span id="cb51-80"><a href="#cb51-80"></a><span class="op">>>></span> <span class="co"># You initiate and use it as before</span></span>
<span id="cb51-81"><a href="#cb51-81"></a><span class="op">>>></span> rb <span class="op">=</span> RestrictionBatch([], [<span class="st">'A'</span>])</span>
<span id="cb51-82"><a href="#cb51-82"></a><span class="op">>>></span> multi_site <span class="op">=</span> Seq(<span class="st">'AAA'</span> <span class="op">+</span> EcoRI.site <span class="op">+</span><span class="st">'G'</span> <span class="op">+</span> KpnI.site <span class="op">+</span> EcoRV.site <span class="op">+</span> <span class="st">'CT'</span> <span class="op">+\</span></span>
<span id="cb51-83"><a href="#cb51-83"></a>SmaI.site <span class="op">+</span> <span class="st">'GT'</span> <span class="op">+</span> FokI.site <span class="op">+</span> <span class="st">'GAAAGGGC'</span> <span class="op">+</span> EcoRI.site <span class="op">+</span> <span class="st">'ACGT'</span>)</span>
<span id="cb51-84"><a href="#cb51-84"></a><span class="op">>>></span></span>
<span id="cb51-85"><a href="#cb51-85"></a><span class="op">>>></span> b <span class="op">=</span> UselessAnalysis(rb, multi_site)</span>
<span id="cb51-86"><a href="#cb51-86"></a><span class="op">>>></span> b.print_that() <span class="co"># Well, I let you discover if you haven't already guessed</span></span></code></pre></div>
<p>Using this example, as a template you should now be able to subclass <code>Analysis</code> as you wish. You will found more implementation details in the module <code>Bio.Restriction.PrintFormat</code> which contains the class providing all the <code>_make_*</code> methods.</p>
<h3 id="limitation-and-caveat"><a name="7"></a>7. Limitation and caveat</h3>
<p>Particularly, the class <code>Analysis</code> is a quick and dirty implementation based on the facilities furnished by the package. Please check your results and report any fault.</p>
<p>On a more general basis, <code>Restriction</code> have some other limitations:</p>
<h4 id="all-dna-are-non-methylated"><a name="7.1"></a>7.1 All DNA are non methylated</h4>
<p>No facility to work with methylated DNA has been implemented yet. As far as the enzyme classes are concerned all DNA is non methylated DNA. Implementation of methylation sensibility will possibly occur in the future. But for now, if your sequence is methylated, you will have to check if the site is methylated using other means.</p>
<h4 id="no-support-for-star-activity"><a name="7.2"></a>7.2 No support for star activity</h4>
<p>As before no support has been yet implemented to find site mis-recognised by enzymes under high salt concentration conditions, the so-called star activity. This will be implemented as soon as I can get a good source of information for that.</p>
<h4 id="safe-to-use-with-degenerated-dna"><a name="7.3"></a>7.3 Safe to use with degenerated DNA</h4>
<p>It is safe to use degenerated DNA as input for the query. You will not be flooded with meaningless results. But this come at a price: GAA<strong><em>N</em></strong>TC will not be recognised as a potential EcoRI site for example, in fact it will not be recognised at all. Degenerated sequences will not be analysed. If your sequence is not fully sequenced, you will certainly miss restriction sites:</p>
<div class="sourceCode" id="cb52"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb52-1"><a href="#cb52-1"></a><span class="op">>>></span> a <span class="op">=</span> Seq(<span class="st">'nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGAATTCrrrrrrrrrrr'</span>)</span>
<span id="cb52-2"><a href="#cb52-2"></a><span class="op">>>></span> EcoRI.search(a)</span>
<span id="cb52-3"><a href="#cb52-3"></a>[<span class="dv">36</span>]</span>
<span id="cb52-4"><a href="#cb52-4"></a><span class="op">>>></span> b <span class="op">=</span> Seq(<span class="st">'nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGAAnTCrrrrrrrrrrr'</span>)</span>
<span id="cb52-5"><a href="#cb52-5"></a><span class="op">>>></span> EcoRI.search(b)</span>
<span id="cb52-6"><a href="#cb52-6"></a>[]</span></code></pre></div>
<h4 id="non-standard-bases-in-dna-are-not-allowed"><a name="7.4"></a>7.4 Non standard bases in DNA are not allowed</h4>
<p>While you can use degenerated DNA, using non standard base alphabet will make the enzymes choke, even if <code>Bio.Seq.Seq</code> accepts them. However, space-like characters (’ ‘,’‘,’, …) and digit will be removed but will not stop the enzyme analysing the sequence. You can use them but the fragments produced by <code>catalyse</code> will have lost any formatting. <code>catalyse</code> tries to keep the original case of the sequence (i.e lower case sequences will generate lower case fragments, upper case sequences upper case fragments), but mixed case will return upper case fragments:</p>
<div class="sourceCode" id="cb53"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb53-1"><a href="#cb53-1"></a><span class="op">>>></span> c <span class="op">=</span> Seq(<span class="st">'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxGAANTCrrrrrrrrrrr'</span>)</span>
<span id="cb53-2"><a href="#cb53-2"></a><span class="op">>>></span> EcoRI.search(c)</span>
<span id="cb53-3"><a href="#cb53-3"></a></span>
<span id="cb53-4"><a href="#cb53-4"></a>Traceback (most recent call last):</span>
<span id="cb53-5"><a href="#cb53-5"></a> File <span class="st">"<stdin>"</span>, line <span class="dv">1</span>, <span class="kw">in</span> <span class="op"><</span>module<span class="op">></span></span>
<span id="cb53-6"><a href="#cb53-6"></a> File <span class="st">"/usr/lib/python3.6/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">553</span>, <span class="kw">in</span> search</span>
<span id="cb53-7"><a href="#cb53-7"></a> cls.dna <span class="op">=</span> FormattedSeq(dna, linear)</span>
<span id="cb53-8"><a href="#cb53-8"></a> File <span class="st">"/usr/lib/python3.6/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">171</span>, <span class="kw">in</span> <span class="fu">__init__</span></span>
<span id="cb53-9"><a href="#cb53-9"></a> <span class="va">self</span>.data <span class="op">=</span> _check_bases(stringy)</span>
<span id="cb53-10"><a href="#cb53-10"></a> File <span class="st">"/usr/lib/python3.6/site-packages/Bio/Restriction/Restriction.py"</span>, line <span class="dv">122</span>, <span class="kw">in</span> _check_bases</span>
<span id="cb53-11"><a href="#cb53-11"></a> <span class="cf">raise</span> <span class="pp">TypeError</span>(<span class="st">"Invalid character found in </span><span class="sc">%s</span><span class="st">"</span> <span class="op">%</span> <span class="bu">repr</span>(seq_string))</span>
<span id="cb53-12"><a href="#cb53-12"></a><span class="pp">TypeError</span>: Invalid character found <span class="kw">in</span> <span class="st">'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGAANTCRRRRRRRRRRR'</span></span>
<span id="cb53-13"><a href="#cb53-13"></a><span class="op">>>></span> d <span class="op">=</span> Seq(<span class="st">'1 nnnnn nnnnn nnnnn nnnnn nnnnn </span><span class="ch">\n</span><span class="st">\</span></span>
<span id="cb53-14"><a href="#cb53-14"></a><span class="st">26 nnnnn nnnnG AATTC rrrrr rrrrr </span><span class="ch">\n</span><span class="st">\</span></span>
<span id="cb53-15"><a href="#cb53-15"></a><span class="st">51 r'</span>)</span>
<span id="cb53-16"><a href="#cb53-16"></a><span class="op">>>></span> d</span>
<span id="cb53-17"><a href="#cb53-17"></a>Seq(<span class="st">'1 nnnnn nnnnn nnnnn nnnnn nnnnn </span><span class="ch">\n</span><span class="st">26 nnnnn nnnnG AATTC rrrrr rrrrr </span><span class="ch">\n</span><span class="st">51 r'</span>)</span>
<span id="cb53-18"><a href="#cb53-18"></a><span class="op">>>></span> EcoRI.search(d)</span>
<span id="cb53-19"><a href="#cb53-19"></a>[<span class="dv">36</span>]</span>
<span id="cb53-20"><a href="#cb53-20"></a><span class="op">>>></span> EcoRI.catalyse(d)</span>
<span id="cb53-21"><a href="#cb53-21"></a>(Seq(<span class="st">'AATTCRRRRRRRRRRR'</span>), Seq(<span class="st">'NNNNNNNNNNNNNNNNNNNNNNNNNNNN</span></span>
<span id="cb53-22"><a href="#cb53-22"></a><span class="st">NNNNNNG'</span>))</span>
<span id="cb53-23"><a href="#cb53-23"></a><span class="op">>>></span> e <span class="op">=</span> Seq(<span class="st">'nnnnGAATTCrr'</span>)</span>
<span id="cb53-24"><a href="#cb53-24"></a><span class="op">>>></span> f <span class="op">=</span> Seq(<span class="st">'NNNNGAATTCRR'</span>)</span>
<span id="cb53-25"><a href="#cb53-25"></a><span class="op">>>></span> g <span class="op">=</span> Seq(<span class="st">'nnnngaattcrr'</span>)</span>
<span id="cb53-26"><a href="#cb53-26"></a><span class="op">>>></span> EcoRI.catalyse(e)</span>
<span id="cb53-27"><a href="#cb53-27"></a>(Seq(<span class="st">'NNNNG'</span>), Seq(<span class="st">'AATTCRR'</span>))</span>
<span id="cb53-28"><a href="#cb53-28"></a><span class="op">>>></span> EcoRI.catalyse(f)</span>
<span id="cb53-29"><a href="#cb53-29"></a>(Seq(<span class="st">'NNNNG'</span>), Seq(<span class="st">'AATTCRR'</span>))</span>
<span id="cb53-30"><a href="#cb53-30"></a><span class="op">>>></span> EcoRI.catalyse(g)</span>
<span id="cb53-31"><a href="#cb53-31"></a>(Seq(<span class="st">'nnnng'</span>), Seq(<span class="st">'aattcrr'</span>))</span></code></pre></div>
<p>Not allowing other letters than IUPAC might seems drastic but this is really to limit errors. It is not totally fool proof but it does help.</p>
<h4 id="sites-found-at-the-edge-of-linear-dna-might-not-be-accessible-in-a-real-digestion"><a name="7.5"></a>7.5 Sites found at the edge of linear DNA might not be accessible in a real digestion</h4>
<p>While sites clearly outsides a sequence will not be reported, nothing has been done to try to determine if a restriction site at the end of a linear sequence is valid:</p>
<div class="sourceCode" id="cb54"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb54-1"><a href="#cb54-1"></a><span class="op">>>></span> d <span class="op">=</span> Seq(<span class="st">'GAATTCAAAAAAAAAAAAAAAAAAAAAAAAAAGGATG'</span>)</span>
<span id="cb54-2"><a href="#cb54-2"></a><span class="op">>>></span> FokI.site <span class="co"># site present</span></span>
<span id="cb54-3"><a href="#cb54-3"></a><span class="co">'GGATG'</span></span>
<span id="cb54-4"><a href="#cb54-4"></a><span class="op">>>></span> FokI.elucidate() <span class="co"># but cut outside the sequence</span></span>
<span id="cb54-5"><a href="#cb54-5"></a><span class="co">'GGATGNNNNNNNNN^NNNN_N'</span></span>
<span id="cb54-6"><a href="#cb54-6"></a><span class="op">>>></span> FokI.search(d) <span class="co"># therefore no site found</span></span>
<span id="cb54-7"><a href="#cb54-7"></a>[]</span>
<span id="cb54-8"><a href="#cb54-8"></a><span class="op">>>></span> EcoRI.search(d)</span>
<span id="cb54-9"><a href="#cb54-9"></a>[<span class="dv">2</span>]</span></code></pre></div>
<p><code>EcoRI</code> finds a site at position 2 even if it is highly unlikely that EcoRI accepts to cut this site in a tube. It is generally considered that at about 5 nucleotides must separate the site from the edge of the sequence to be reasonably sure the enzyme will work correctly. This “security margin” is variable from one enzyme to the other. In doubt consult the documentation for the enzyme.</p>
<h4 id="restriction-reports-cutting-sites-not-enzyme-recognition-sites"><a name="7.6"></a>7.6 Restriction reports cutting sites not enzyme recognition sites</h4>
<p>Some enzymes will cut twice each time they encounter a restriction site. The enzymes in this package report both cut not the site. Other software may only reports restriction sites. Therefore the output given for some enzymes might seems to be the double when compared with the results of these software. It is not a bug.</p>
<div class="sourceCode" id="cb55"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb55-1"><a href="#cb55-1"></a><span class="op">>>></span> AloI.cut_twice()</span>
<span id="cb55-2"><a href="#cb55-2"></a><span class="va">True</span></span>
<span id="cb55-3"><a href="#cb55-3"></a><span class="op">>>></span> AloI.fst5 <span class="co"># first cut</span></span>
<span id="cb55-4"><a href="#cb55-4"></a><span class="op">-</span><span class="dv">7</span></span>
<span id="cb55-5"><a href="#cb55-5"></a><span class="op">>>></span> AloI.scd5 <span class="co"># second cut</span></span>
<span id="cb55-6"><a href="#cb55-6"></a><span class="dv">25</span></span>
<span id="cb55-7"><a href="#cb55-7"></a><span class="op">>>></span> AloI.site</span>
<span id="cb55-8"><a href="#cb55-8"></a><span class="co">'GAACNNNNNNTCC'</span></span>
<span id="cb55-9"><a href="#cb55-9"></a><span class="op">>>></span> b <span class="op">=</span> Seq(<span class="st">'AAAAAAAAAAA'</span><span class="op">+</span> AloI.site <span class="op">+</span> <span class="st">'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'</span>)</span>
<span id="cb55-10"><a href="#cb55-10"></a><span class="op">>>></span> b</span>
<span id="cb55-11"><a href="#cb55-11"></a>Seq(<span class="st">'AAAAAAAAAAAGAACNNNNNNTCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'</span>)</span>
<span id="cb55-12"><a href="#cb55-12"></a><span class="op">>>></span> AloI.search(b) <span class="co"># one site, two cuts -> two positions</span></span>
<span id="cb55-13"><a href="#cb55-13"></a>[<span class="dv">5</span>, <span class="dv">37</span>]</span></code></pre></div>
<h3 id="annex-modifying-dir-to-use-with-from-bio.restriction-import"><a name="8"></a>8. Annex: modifying dir() to use with from Bio.Restriction import *</h3>
<p>Having all the enzymes imported directly in the shell is useful when working in an interactive shell (even if it is not recommended by the purists). Here is a little hack to get some sanity back when using dir() in those conditions:</p>
<div class="sourceCode" id="cb56"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb56-1"><a href="#cb56-1"></a><span class="op">>>></span> <span class="co"># we will change the builtin dir() function to get ride of the enzyme names.</span></span>
<span id="cb56-2"><a href="#cb56-2"></a><span class="op">>>></span> <span class="im">import</span> sys</span>
<span id="cb56-3"><a href="#cb56-3"></a><span class="op">>>></span> <span class="kw">def</span> <span class="bu">dir</span>(<span class="bu">object</span><span class="op">=</span><span class="va">None</span>):</span>
<span id="cb56-4"><a href="#cb56-4"></a> <span class="co">"""dir([object]) -> list of string.</span></span>
<span id="cb56-5"><a href="#cb56-5"></a></span>
<span id="cb56-6"><a href="#cb56-6"></a><span class="co"> over-ride the built-in function to get some clarity."""</span></span>
<span id="cb56-7"><a href="#cb56-7"></a> <span class="cf">if</span> <span class="bu">object</span>:</span>
<span id="cb56-8"><a href="#cb56-8"></a> <span class="co"># we only want to modify dir(),</span></span>
<span id="cb56-9"><a href="#cb56-9"></a> <span class="co"># so here we return the result of the builtin function.</span></span>
<span id="cb56-10"><a href="#cb56-10"></a> <span class="cf">return</span> __builtins__.<span class="bu">dir</span>(<span class="bu">object</span>)</span>
<span id="cb56-11"><a href="#cb56-11"></a> <span class="cf">else</span>:</span>
<span id="cb56-12"><a href="#cb56-12"></a> <span class="co"># now the part we want to modify.</span></span>
<span id="cb56-13"><a href="#cb56-13"></a> <span class="co"># All the enzymes are in a RestrictionBatch (we will talk about</span></span>
<span id="cb56-14"><a href="#cb56-14"></a> <span class="co"># that later, for the moment simply believe me).</span></span>
<span id="cb56-15"><a href="#cb56-15"></a> <span class="co"># So if we remove from the results of dir() everything which is</span></span>
<span id="cb56-16"><a href="#cb56-16"></a> <span class="co"># in AllEnzymes we will get a much shorter list when we do dir()</span></span>
<span id="cb56-17"><a href="#cb56-17"></a> <span class="co">#</span></span>
<span id="cb56-18"><a href="#cb56-18"></a> <span class="co"># the current level is __main__ ie dir() is equivalent to</span></span>
<span id="cb56-19"><a href="#cb56-19"></a> <span class="co"># ask what's in __main__ at the moment.</span></span>
<span id="cb56-20"><a href="#cb56-20"></a> <span class="co"># we can't access __main__ directly.</span></span>
<span id="cb56-21"><a href="#cb56-21"></a> <span class="co"># so we will use sys.modules['__main__'] to reach it.</span></span>
<span id="cb56-22"><a href="#cb56-22"></a> <span class="co"># the following list comprehension remove from the result of</span></span>
<span id="cb56-23"><a href="#cb56-23"></a> <span class="co"># dir() everything which is also present in AllEnzymes.</span></span>
<span id="cb56-24"><a href="#cb56-24"></a> <span class="co">#</span></span>
<span id="cb56-25"><a href="#cb56-25"></a> <span class="cf">return</span> [x <span class="cf">for</span> x <span class="kw">in</span> __builtins__.<span class="bu">dir</span>(sys.modules[<span class="st">'__main__'</span>])</span>
<span id="cb56-26"><a href="#cb56-26"></a> <span class="cf">if</span> <span class="kw">not</span> x <span class="kw">in</span> AllEnzymes]</span>
<span id="cb56-27"><a href="#cb56-27"></a></span>
<span id="cb56-28"><a href="#cb56-28"></a><span class="op">>>></span> <span class="co"># now let's see if it works.</span></span>
<span id="cb56-29"><a href="#cb56-29"></a><span class="op">>>></span> <span class="bu">dir</span>()</span>
<span id="cb56-30"><a href="#cb56-30"></a>[<span class="st">'AllEnzymes'</span>, <span class="st">'Analysis'</span>, <span class="st">'CommOnly'</span>, <span class="st">'NonComm'</span>, <span class="st">'PrintFormat'</span>, <span class="st">'RanaConfig'</span>,</span>
<span id="cb56-31"><a href="#cb56-31"></a> <span class="st">'Restriction'</span>, <span class="st">'RestrictionBatch'</span>, <span class="st">'Restriction_Dictionary'</span>, <span class="st">'__builtins__'</span>,</span>
<span id="cb56-32"><a href="#cb56-32"></a> <span class="st">'__doc__'</span>, <span class="st">'__name__'</span>, <span class="st">'dir'</span>, <span class="st">'sys'</span>]</span>
<span id="cb56-33"><a href="#cb56-33"></a><span class="op">>>></span> <span class="co"># ok that's much better.</span></span>
<span id="cb56-34"><a href="#cb56-34"></a><span class="op">>>></span> <span class="co"># The enzymes are still there</span></span>
<span id="cb56-35"><a href="#cb56-35"></a><span class="op">>>></span> EcoRI.site</span>
<span id="cb56-36"><a href="#cb56-36"></a><span class="co">'GAATTC'</span></span></code></pre></div>
</body>
</html>
|