1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>FreeMat: REGEXP Regular Expression Matching Function</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">FreeMat
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.1.1 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
</ul>
</div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('string_regexp.html','');});
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">REGEXP Regular Expression Matching Function </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Section: <a class="el" href="sec_string.html">String Functions</a> </p>
<h1><a class="anchor" id="Usage"></a>
Usage</h1>
<p>Matches regular expressions in the provided string. This function is complicated, and compatibility with MATLABs syntax is not perfect. The syntax for its use is </p>
<pre class="fragment"> regexp('str','expr')
</pre><p> which returns a row vector containing the starting index of each substring of <code>str</code> that matches the regular expression described by <code>expr</code>. The second form of <code>regexp</code> returns six outputs in the following order: </p>
<pre class="fragment"> [start stop tokenExtents match tokens names] = regexp('str','expr')
</pre><p> where the meaning of each of the outputs is defined below. </p>
<ul>
<li>
<code>start</code> is a row vector containing the starting index of each substring that matches the regular expression. </li>
<li>
<code>stop</code> is a row vector containing the ending index of each substring that matches the regular expression. </li>
<li>
<code>tokenExtents</code> is a cell array containing the starting and ending indices of each substring that matches the <code>tokens</code> in the regular expression. A token is a captured part of the regular expression. If the <code>'once'</code> mode is used, then this output is a <code>double</code> array. </li>
<li>
<code>match</code> is a cell array containing the text for each substring that matches the regular expression. In <code>'once'</code> mode, this is a string. </li>
<li>
<code>tokens</code> is a cell array of cell arrays of strings that correspond to the tokens in the regular expression. In <code>'once'</code> mode, this is a cell array of strings. </li>
<li>
<code>named</code> is a structure array containing the named tokens captured in a regular expression. Each named token is assigned a field in the resulting structure array, and each element of the array corresponds to a different match. </li>
</ul>
<p>If you want only some of the the outputs, you can use the following variant of <code>regexp</code>: </p>
<pre class="fragment"> [o1 o2 ...] = regexp('str','expr', 'p1', 'p2', ...)
</pre><p> where <code>p1</code> etc. are the names of the outputs (and the order we want the outputs in). As a final variant, you can supply some mode flags to <code>regexp</code> </p>
<pre class="fragment"> [o1 o2 ...] = regexp('str','expr', p1, p2, ..., 'mode1', 'mode2')
</pre><p> where acceptable <code>mode</code> flags are: </p>
<ul>
<li>
<code>'once'</code> - only the first match is returned. </li>
<li>
<code>'matchcase'</code> - letter case must match (selected by default for <code>regexp</code>) </li>
<li>
<code>'ignorecase'</code> - letter case is ignored (selected by default for <code>regexpi</code>) </li>
<li>
<code>'dotall'</code> - the <code>'.'</code> operator matches any character (default) </li>
<li>
<code>'dotexceptnewline'</code> - the <code>'.'</code> operator does not match the newline character </li>
<li>
<code>'stringanchors'</code> - the <code>^</code> and <code>$</code> operators match at the beginning and end (respectively) of a string. </li>
<li>
<code>'lineanchors'</code> - the <code>^</code> and <code>$</code> operators match at the beginning and end (respectively) of a line. </li>
<li>
<code>'literalspacing'</code> - the space characters and comment characters <code>#</code> are matched as literals, just like any other ordinary character (default). </li>
<li>
<code>'freespacing'</code> - all spaces and comments are ignored in the regular expression. You must use '\ ' and '#' to match spaces and comment characters, respectively. </li>
</ul>
<p>Note the following behavior differences between MATLABs regexp and FreeMats: </p>
<ul>
<li>
If you have an old version of <code>pcre</code> installed, then named tokens must use the older <code><?P<name></code> syntax, instead of the new <code><?<name></code> syntax. </li>
<li>
The <code>pcre</code> library is pickier about named tokens and their appearance in expressions. So, for example, the regexp from the MATLAB manual <code>'(?<first>\w+)\s+(?<last>\w+)</code>(?<last>\w+),\s+(?<first>\w+)'| does not work correctly (as of this writing) because the same named tokens appear multiple times. The workaround is to assign different names to each token, and then collapse the results later. </li>
</ul>
<h1><a class="anchor" id="Example"></a>
Example</h1>
<p>Some examples of using the <code>regexp</code> function</p>
<pre class="fragment">--> [start,stop,tokenExtents,match,tokens,named] = regexp('quick down town zoo','(.)own')
start =
7 12
stop =
10 15
tokenExtents =
[1x2 double array] [1x2 double array]
match =
[down] [town]
tokens =
[1x1 cell array] [1x1 cell array]
named =
[]
</pre> </div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="index.html">FreeMat Documentation</a></li><li class="navelem"><a class="el" href="sec_string.html">String Functions</a></li>
<li class="footer">Generated on Thu Jul 25 2013 17:18:29 for FreeMat by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.1.1 </li>
</ul>
</div>
</body>
</html>
|