File: string_regexp.html

package info (click to toggle)
freemat 4.2%2Bdfsg1-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 141,800 kB
  • ctags: 14,082
  • sloc: ansic: 126,788; cpp: 62,046; python: 2,080; perl: 1,255; sh: 1,146; yacc: 1,019; lex: 239; makefile: 100
file content (146 lines) | stat: -rw-r--r-- 7,443 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>FreeMat: REGEXP Regular Expression Matching Function</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
  $(document).ready(initResizable);
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
 <tbody>
 <tr style="height: 56px;">
  <td style="padding-left: 0.5em;">
   <div id="projectname">FreeMat
   </div>
  </td>
 </tr>
 </tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.1.1 -->
  <div id="navrow1" class="tabs">
    <ul class="tablist">
      <li><a href="index.html"><span>Main&#160;Page</span></a></li>
      <li class="current"><a href="pages.html"><span>Related&#160;Pages</span></a></li>
    </ul>
  </div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
  <div id="nav-tree">
    <div id="nav-tree-contents">
    </div>
  </div>
  <div id="splitbar" style="-moz-user-select:none;" 
       class="ui-resizable-handle">
  </div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('string_regexp.html','');});
</script>
<div id="doc-content">
<div class="header">
  <div class="headertitle">
<div class="title">REGEXP Regular Expression Matching Function </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Section: <a class="el" href="sec_string.html">String Functions</a> </p>
<h1><a class="anchor" id="Usage"></a>
Usage</h1>
<p>Matches regular expressions in the provided string. This function is complicated, and compatibility with MATLABs syntax is not perfect. The syntax for its use is </p>
<pre class="fragment">  regexp('str','expr')
</pre><p> which returns a row vector containing the starting index of each substring of <code>str</code> that matches the regular expression described by <code>expr</code>. The second form of <code>regexp</code> returns six outputs in the following order: </p>
<pre class="fragment">  [start stop tokenExtents match tokens names] = regexp('str','expr')
</pre><p> where the meaning of each of the outputs is defined below. </p>
<ul>
<li>
<code>start</code> is a row vector containing the starting index of each substring that matches the regular expression.  </li>
<li>
<code>stop</code> is a row vector containing the ending index of each substring that matches the regular expression.  </li>
<li>
<code>tokenExtents</code> is a cell array containing the starting and ending indices of each substring that matches the <code>tokens</code> in the regular expression. A token is a captured part of the regular expression. If the <code>'once'</code> mode is used, then this output is a <code>double</code> array.  </li>
<li>
<code>match</code> is a cell array containing the text for each substring that matches the regular expression. In <code>'once'</code> mode, this is a string.  </li>
<li>
<code>tokens</code> is a cell array of cell arrays of strings that correspond to the tokens in the regular expression. In <code>'once'</code> mode, this is a cell array of strings.  </li>
<li>
<code>named</code> is a structure array containing the named tokens captured in a regular expression. Each named token is assigned a field in the resulting structure array, and each element of the array corresponds to a different match.  </li>
</ul>
<p>If you want only some of the the outputs, you can use the following variant of <code>regexp</code>: </p>
<pre class="fragment">  [o1 o2 ...] = regexp('str','expr', 'p1', 'p2', ...)
</pre><p> where <code>p1</code> etc. are the names of the outputs (and the order we want the outputs in). As a final variant, you can supply some mode flags to <code>regexp</code> </p>
<pre class="fragment">  [o1 o2 ...] = regexp('str','expr', p1, p2, ..., 'mode1', 'mode2')
</pre><p> where acceptable <code>mode</code> flags are: </p>
<ul>
<li>
<code>'once'</code> - only the first match is returned.  </li>
<li>
<code>'matchcase'</code> - letter case must match (selected by default for <code>regexp</code>)  </li>
<li>
<code>'ignorecase'</code> - letter case is ignored (selected by default for <code>regexpi</code>)  </li>
<li>
<code>'dotall'</code> - the <code>'.'</code> operator matches any character (default)  </li>
<li>
<code>'dotexceptnewline'</code> - the <code>'.'</code> operator does not match the newline character  </li>
<li>
<code>'stringanchors'</code> - the <code>^</code> and <code>$</code> operators match at the beginning and end (respectively) of a string.  </li>
<li>
<code>'lineanchors'</code> - the <code>^</code> and <code>$</code> operators match at the beginning and end (respectively) of a line.  </li>
<li>
<code>'literalspacing'</code> - the space characters and comment characters <code>#</code> are matched as literals, just like any other ordinary character (default).  </li>
<li>
<code>'freespacing'</code> - all spaces and comments are ignored in the regular expression. You must use '\ ' and '#' to match spaces and comment characters, respectively.  </li>
</ul>
<p>Note the following behavior differences between MATLABs regexp and FreeMats: </p>
<ul>
<li>
If you have an old version of <code>pcre</code> installed, then named tokens must use the older <code>&lt;?P&lt;name&gt;</code> syntax, instead of the new <code>&lt;?&lt;name&gt;</code> syntax.  </li>
<li>
The <code>pcre</code> library is pickier about named tokens and their appearance in expressions. So, for example, the regexp from the MATLAB manual <code>'(?&lt;first&gt;\w+)\s+(?&lt;last&gt;\w+)</code>(?&lt;last&gt;\w+),\s+(?&lt;first&gt;\w+)'| does not work correctly (as of this writing) because the same named tokens appear multiple times. The workaround is to assign different names to each token, and then collapse the results later.  </li>
</ul>
<h1><a class="anchor" id="Example"></a>
Example</h1>
<p>Some examples of using the <code>regexp</code> function</p>
<pre class="fragment">--&gt; [start,stop,tokenExtents,match,tokens,named] = regexp('quick down town zoo','(.)own')
start = 
  7 12 

stop = 
 10 15 

tokenExtents = 
 [1x2 double array] [1x2 double array] 

match = 
 [down] [town] 

tokens = 
 [1x1 cell array] [1x1 cell array] 

named = 
  []
</pre> </div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
  <ul>
    <li class="navelem"><a class="el" href="index.html">FreeMat Documentation</a></li><li class="navelem"><a class="el" href="sec_string.html">String Functions</a></li>
    <li class="footer">Generated on Thu Jul 25 2013 17:18:29 for FreeMat by
    <a href="http://www.doxygen.org/index.html">
    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.1.1 </li>
  </ul>
</div>
</body>
</html>