1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173
|
= Pattern matching notation
:encoding: UTF-8
:lang: en
//:title: Yash manual - Pattern matching notation
:description: This page describes pattern matching notation supported by yash.
dfn:[Pattern matching notation] is a syntax of dfn:[patterns] that represent
particular sets of strings.
When a string is included in the set of strings a pattern represents,
the pattern is said to dfn:[match] the string.
Whether a pattern matches a string or not is defined as follows.
[[normal]]
== Normal characters
A character that is not link:syntax.html#quotes[quoted] or any of special
characters defined below is a normal character, which matches the character
itself.
For example, the pattern +abc+ matches the string +abc+, and not any other
strings.
[[single]]
== Single-character wildcard
The character +?+ matches any single character.
For example, the pattern +a?c+ matches any three-character strings that starts
with +a+ and ends with +c+, such as +aac+, +abc+, and +a;c+.
[[multiple]]
== Multi-character wildcard
The character +*+ matches any strings (of any length, including the empty
string).
For example, the pattern +a*c+ matches any string that starts with +a+
and ends with +c+, such as +ac+, +abc+, and +a;xyz;c+.
[[bracket]]
== Bracket expression
A pattern that is enclosed by brackets (+[+ and +]+) is a dfn:[bracket
expression].
A bracket expression must have at least one character between the brackets.
The characters between the brackets are interpreted as a dfn:[bracket
expression pattern], which is a below-defined special notation for bracket
expression.
A bracket expression pattern represents a set of characters.
The bracket expression matches any one of the characters in the set the
bracket expression pattern represents.
If the opening bracket (+[+) is followed by an exclamation mark (+!+), the
exclamation is not treated as part of the bracket expression pattern and the
whole bracket expression instead matches a character that is _not_ included in
the set the bracket expression pattern represents.
If the opening bracket is followed by a caret (+^+), it is treated like an
exclamation mark as above (but shells other than yash may treat the caret
differently).
If the opening bracket (or the following exclamation or caret, if any) is
followed by a closing bracket (+]+), it is treated as part of the bracket
expression pattern rather than the end of the bracket expression.
You cannot link:syntax.html#quotes[quote] characters in the bracket expression
pattern because quotation is treated before bracket expression.
An opening bracket in a pattern is treated as a normal character if it is not
the beginning of a valid bracket expression.
[[bra-normal]]
== Normal characters (in bracket expression pattern)
A character that is not any of special characters defined below is a normal
character, which represents the character itself.
For example, the bracket expression pattern +abc+ represents the set of the
three characters +a+, +b+, and +c+. The bracket expression +[abc]+ therefore
matches any of the three characters.
[[bra-range]]
== Range expressions
A hyphen preceded and followed by a character (or <<bra-colsym,collating
symbol>>) is a dfn:[range expression], which represents the set of the two
characters and all characters between the two in the collation order.
A dfn:[collation order] is an order of characters that is defined in the
locale data.
If a hyphen is followed by a closing bracket (+]+), the bracket is treated as
the end of the bracket expression and the hyphen as a normal character.
For example, the range expression +3-5+ represents the set of the three
characters +3+, +4+, and +5+. The bracket expression +[3-5-]+ therefore
matches one of the four characters +3+, +4+, +5+, and +-+.
[[bra-colsym]]
== Collating symbols
A dfn:[collating symbol] allows more than one character to be treated as a
single character in matching.
A collating symbol is made up of one or more characters enclosed by the
special brackets +[.+ and +.]+.
One or more characters that are treated as a single character in matching are
called a dfn:[collating element].
Precisely, a bracket expression pattern represents a set of collating elements
and a bracket expression matches a collating element rather than a character,
but we do not differentiate them for brevity here.
For example, the character combination ``ch'' was treated as a single
character in the traditional Spanish language.
If this character combination is registered as a collating element in the
locale data, the bracket expression +[[.ch.]df]+ matches one of +ch+, +d+, and
+f+.
[[bra-eqclass]]
== Equivalence classes
An dfn:[equivalence class] represents a set of characters that are considered
_equivalent_.
A equivalence class is made up of a character (or more precisely, a collating
element) enclosed by the special brackets +[=+ and +=]+.
An equivalence class represents the set of characters that consists of the
character enclosed by the brackets and the characters that are in the same
primary equivalence class as the enclosed character.
The shell consults the locale data for the definition of equivalence classes
in the current locale.
For example, if the six characters +a+, +à+, +á+, +â+,
+ã+, +ä+ are defined to be in the same primary equivalence class,
the bracket expressions +[[=a=]]+, +[[=à=]]+, and +[[=á=]]+ match
one of the six.
[[bra-chclass]]
== Character classes
A dfn:[character class] represents a predefined set of characters.
A character class is made up of a class name enclosed by the special brackets
+[:+ and +:]+.
The shell consults the locale data for which class a character belongs to.
The following character classes can be used in all locales:
+[:lower:]+::
set of lowercase letters
+[:upper:]+::
set of uppercase letters
+[:alpha:]+::
set of letters, including the +[:lower:]+ and +[:upper:]+ classes.
+[:digit:]+::
set of decimal digits
+[:xdigit:]+::
set of hexadecimal digits
+[:alnum:]+::
set of letters and digits, including the +[:alpha:]+ and +[:digit:]+ classes.
+[:blank:]+::
set of blank characters, not including the newline character
+[:space:]+::
set of space characters, including the newline character
+[:punct:]+::
set of punctuations
+[:print:]+::
set of printable characters
+[:cntrl:]+::
set of control characters
For example, the bracket expression `[[:lower:][:upper:]]` matches a lower or
upper case character.
In addition to the classes listed above, other classes may be used depending
on the definition of the current locale.
// vim: set filetype=asciidoc textwidth=78 expandtab:
|