File: verbose.html

package info (click to toggle)
diveintopython 5.4-2
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k, jessie, jessie-kfreebsd, lenny, squeeze, wheezy
  • size: 4,116 kB
  • ctags: 2,838
  • sloc: python: 4,417; xml: 894; makefile: 29
file content (136 lines) | stat: -rw-r--r-- 11,898 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136

<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   
      <title>7.5.&nbsp;Verbose Regular Expressions</title>
      <link rel="stylesheet" href="../diveintopython.css" type="text/css">
      <link rev="made" href="mailto:f8dy@diveintopython.org">
      <meta name="generator" content="DocBook XSL Stylesheets V1.52.2">
      <meta name="keywords" content="Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free">
      <meta name="description" content="Python from novice to pro">
      <link rel="home" href="../toc/index.html" title="Dive Into Python">
      <link rel="up" href="index.html" title="Chapter&nbsp;7.&nbsp;Regular Expressions">
      <link rel="previous" href="n_m_syntax.html" title="7.4.&nbsp;Using the {n,m} Syntax">
      <link rel="next" href="phone_numbers.html" title="7.6.&nbsp;Case study: Parsing Phone Numbers">
   </head>
   <body>
      <table id="Header" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td id="breadcrumb" colspan="5" align="left" valign="top">You are here: <a href="../index.html">Home</a>&nbsp;&gt;&nbsp;<a href="../toc/index.html">Dive Into Python</a>&nbsp;&gt;&nbsp;<a href="index.html">Regular Expressions</a>&nbsp;&gt;&nbsp;<span class="thispage">Verbose Regular Expressions</span></td>
            <td id="navigation" align="right" valign="top">&nbsp;&nbsp;&nbsp;<a href="n_m_syntax.html" title="Prev: &#8220;Using the {n,m} Syntax&#8221;">&lt;&lt;</a>&nbsp;&nbsp;&nbsp;<a href="phone_numbers.html" title="Next: &#8220;Case study: Parsing Phone Numbers&#8221;">&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3" id="logocontainer">
               <h1 id="logo"><a href="../index.html" accesskey="1">Dive Into Python</a></h1>
               <p id="tagline">Python from novice to pro</p>
            </td>
            <td colspan="3" align="right">
               <form id="search" method="GET" action="http://www.google.com/custom">
                  <p><label for="q" accesskey="4">Find:&nbsp;</label><input type="text" id="q" name="q" size="20" maxlength="255" value=" "> <input type="submit" value="Search"><input type="hidden" name="cof" value="LW:752;L:http://diveintopython.org/images/diveintopython.png;LH:42;AH:left;GL:0;AWFID:3ced2bb1f7f1b212;"><input type="hidden" name="domains" value="diveintopython.org"><input type="hidden" name="sitesearch" value="diveintopython.org"></p>
               </form>
            </td>
         </tr>
      </table>
      <!--#include virtual="/inc/ads" -->
      <div class="section" lang="en">
         <div class="titlepage">
            <div>
               <div>
                  <h2 class="title"><a name="re.verbose"></a>7.5.&nbsp;Verbose Regular Expressions
                  </h2>
               </div>
            </div>
            <div></div>
         </div>
         <div class="abstract">
            <p>So far you've just been dealing with what I'll call &#8220;<span class="quote">compact</span>&#8221; regular expressions.  As you've seen, they are difficult to read, and even if you figure out what one does, that's no guarantee
               that you'll be able to understand it six months later.  What you really need is inline documentation.
            </p>
         </div>
         <p><span class="application">Python</span> allows you to do this with something called <span class="emphasis"><em>verbose regular expressions</em></span>.  A verbose regular expression is different from a compact regular expression in two ways:
         </p>
         <div class="itemizedlist">
            <ul>
               <li>Whitespace is ignored.  Spaces, tabs, and carriage returns are not matched as spaces, tabs, and carriage returns.  They're
                  not matched at all.  (If you want to match a space in a verbose regular expression, you'll need to escape it by putting a
                  backslash in front of it.)
               </li>
               <li>Comments are ignored.  A comment in a verbose regular expression is just like a comment in Python code: it starts with a <tt class="literal">#</tt> character and goes until the end of the line.  In this case it's a comment within a multi-line string instead of within your
                  source code, but it works the same way.
               </li>
            </ul>
         </div>
         <p>This will be more clear with an example.  Let's revisit the compact regular expression you've been working with, and make
            it a verbose regular expression.  This example shows how.
         </p>
         <div class="example"><a name="d0e18777"></a><h3 class="title">Example&nbsp;7.9.&nbsp;Regular Expressions with Inline Comments</h3><pre class="screen">
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">pattern = <span class='pystring'>"""
    ^                   # beginning of string
    M{0,4}              # thousands - 0 to 4 M's
    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                        #            or 500-800 (D, followed by 0 to 3 C's)
    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                        #        or 50-80 (L, followed by 0 to 3 X's)
    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                        #        or 5-8 (V, followed by 0 to 3 I's)
    $                   # end of string
    """</span></span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">re.search(pattern, <span class='pystring'>'M'</span>, re.VERBOSE)</span>                <a name="re.verbose.1.1"></a><img src="../images/callouts/1.png" alt="1" border="0" width="12" height="12">
<span class="computeroutput">&lt;_sre.SRE_Match object at 0x008EEB48&gt;</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">re.search(pattern, <span class='pystring'>'MCMLXXXIX'</span>, re.VERBOSE)</span>        <a name="re.verbose.1.2"></a><img src="../images/callouts/2.png" alt="2" border="0" width="12" height="12">
<span class="computeroutput">&lt;_sre.SRE_Match object at 0x008EEB48&gt;</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">re.search(pattern, <span class='pystring'>'MMMMDCCCLXXXVIII'</span>, re.VERBOSE)</span> <a name="re.verbose.1.3"></a><img src="../images/callouts/3.png" alt="3" border="0" width="12" height="12">
<span class="computeroutput">&lt;_sre.SRE_Match object at 0x008EEB48&gt;</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">re.search(pattern, <span class='pystring'>'M'</span>)</span>                            <a name="re.verbose.1.4"></a><img src="../images/callouts/4.png" alt="4" border="0" width="12" height="12">
</pre><div class="calloutlist">
               <table border="0" summary="Callout list">
                  <tr>
                     <td width="12" valign="top" align="left"><a href="#re.verbose.1.1"><img src="../images/callouts/1.png" alt="1" border="0" width="12" height="12"></a> 
                     </td>
                     <td valign="top" align="left">The most important thing to remember when using verbose regular expressions is that you need to pass an extra argument when
                        working with them: <tt class="literal">re.VERBOSE</tt> is a constant defined in the <tt class="filename">re</tt> module that signals that the pattern should be treated as a verbose regular expression.  As you can see, this pattern has
                        quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored).  Once you ignore the
                        whitespace and the comments, this is exactly the same regular expression as you saw in <a href="n_m_syntax.html" title="7.4.&nbsp;Using the {n,m} Syntax">the previous section</a>, but it's a lot more readable.
                     </td>
                  </tr>
                  <tr>
                     <td width="12" valign="top" align="left"><a href="#re.verbose.1.2"><img src="../images/callouts/2.png" alt="2" border="0" width="12" height="12"></a> 
                     </td>
                     <td valign="top" align="left">This matches the start of the string, then one of a possible four <tt class="literal">M</tt>, then <tt class="literal">CM</tt>, then <tt class="literal">L</tt> and three of a possible three <tt class="literal">X</tt>, then <tt class="literal">IX</tt>, then the end of the string.
                     </td>
                  </tr>
                  <tr>
                     <td width="12" valign="top" align="left"><a href="#re.verbose.1.3"><img src="../images/callouts/3.png" alt="3" border="0" width="12" height="12"></a> 
                     </td>
                     <td valign="top" align="left">This matches the start of the string, then four of a possible four <tt class="literal">M</tt>, then <tt class="literal">D</tt> and three of a possible three <tt class="literal">C</tt>, then <tt class="literal">L</tt> and three of a possible three <tt class="literal">X</tt>, then <tt class="literal">V</tt> and three of a possible three <tt class="literal">I</tt>, then the end of the string.
                     </td>
                  </tr>
                  <tr>
                     <td width="12" valign="top" align="left"><a href="#re.verbose.1.4"><img src="../images/callouts/4.png" alt="4" border="0" width="12" height="12"></a> 
                     </td>
                     <td valign="top" align="left">This does not match.  Why?  Because it doesn't have the <tt class="literal">re.VERBOSE</tt> flag, so the <tt class="function">re.search</tt> function is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks.  <span class="application">Python</span> can't auto-detect whether a regular expression is verbose or not.  <span class="application">Python</span> assumes every regular expression is compact unless you explicitly state that it is verbose.
                     </td>
                  </tr>
               </table>
            </div>
         </div>
      </div>
      <table class="Footer" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td width="35%" align="left"><br><a class="NavigationArrow" href="n_m_syntax.html">&lt;&lt;&nbsp;Using the {n,m} Syntax</a></td>
            <td width="30%" align="center"><br>&nbsp;<span class="divider">|</span>&nbsp;<a href="index.html#re.intro" title="7.1.&nbsp;Diving In">1</a> <span class="divider">|</span> <a href="street_addresses.html" title="7.2.&nbsp;Case Study: Street Addresses">2</a> <span class="divider">|</span> <a href="roman_numerals.html" title="7.3.&nbsp;Case Study: Roman Numerals">3</a> <span class="divider">|</span> <a href="n_m_syntax.html" title="7.4.&nbsp;Using the {n,m} Syntax">4</a> <span class="divider">|</span> <span class="thispage">5</span> <span class="divider">|</span> <a href="phone_numbers.html" title="7.6.&nbsp;Case study: Parsing Phone Numbers">6</a> <span class="divider">|</span> <a href="summary.html" title="7.7.&nbsp;Summary">7</a>&nbsp;<span class="divider">|</span>&nbsp;
            </td>
            <td width="35%" align="right"><br><a class="NavigationArrow" href="phone_numbers.html">Case study: Parsing Phone Numbers&nbsp;&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3"><br></td>
         </tr>
      </table>
      <div class="Footer">
         <p class="copyright">Copyright &copy; 2000, 2001, 2002, 2003, 2004 <a href="mailto:mark@diveintopython.org">Mark Pilgrim</a></p>
      </div>
   </body>
</html>