1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
|
[;1m split(Subject, RE)[0m
There is no documentation for split(Subject, RE, [])
[;1m split(Subject, RE, Options)[0m
Splits the input into parts by finding tokens according to the
regular expression supplied.
The splitting is basically done by running a global regular
expression match and dividing the initial string wherever a match
occurs. The matching part of the string is removed from the
output.
As in [;;4mrun/3[0m, an [;;4mmp/0[0m compiled with option [;;4municode[0m requires [;;4m[0m
[;;4mSubject[0m to be a Unicode [;;4mcharlist()[0m. If compilation is done
implicitly and the [;;4municode[0m compilation option is specified to
this function, both the regular expression and [;;4mSubject[0m are to be
specified as valid Unicode [;;4mcharlist()[0ms.
The result is given as a list of "strings", the preferred data
type specified in option [;;4mreturn[0m (default [;;4miodata[0m).
If subexpressions are specified in the regular expression, the
matching subexpressions are returned in the resulting list as
well. For example:
re:split("Erlang","[ln]",[{return,list}]).
gives
["Er","a","g"]
while
re:split("Erlang","([ln])",[{return,list}]).
gives
["Er","l","a","n","g"]
The text matching the subexpression (marked by the parentheses in
the regular expression) is inserted in the result list where it
was found. This means that concatenating the result of a split
where the whole regular expression is a single subexpression (as
in the last example) always results in the original string.
As there is no matching subexpression for the last part in the
example (the "g"), nothing is inserted after that. To make the
group of strings and the parts matching the subexpressions more
obvious, one can use option [;;4mgroup[0m, which groups together the
part of the subject string with the parts matching the
subexpressions when the string was split:
re:split("Erlang","([ln])",[{return,list},group]).
gives
[["Er","l"],["a","n"],["g"]]
Here the regular expression first matched the "l", causing "Er" to
be the first part in the result. When the regular expression
matched, the (only) subexpression was bound to the "l", so the "l"
is inserted in the group together with "Er". The next match is of
the "n", making "a" the next part to be returned. As the
subexpression is bound to substring "n" in this case, the "n" is
inserted into this group. The last group consists of the remaining
string, as no more matches are found.
By default, all parts of the string, including the empty strings,
are returned from the function, for example:
re:split("Erlang","[lg]",[{return,list}]).
gives
["Er","an",[]]
as the matching of the "g" in the end of the string leaves an
empty rest, which is also returned. This behavior differs from the
default behavior of the split function in Perl, where empty
strings at the end are by default removed. To get the "trimming"
default behavior of Perl, specify [;;4mtrim[0m as an option:
re:split("Erlang","[lg]",[{return,list},trim]).
gives
["Er","an"]
The "trim" option says; "give me as many parts as possible except
the empty ones", which sometimes can be useful. You can also
specify how many parts you want, by specifying [;;4m{parts,[0mN[;;4m}[0m:
re:split("Erlang","[lg]",[{return,list},{parts,2}]).
gives
["Er","ang"]
Notice that the last part is "ang", not "an", as splitting was
specified into two parts, and the splitting stops when enough
parts are given, which is why the result differs from that of [;;4m[0m
[;;4mtrim[0m.
More than three parts are not possible with this indata, so
re:split("Erlang","[lg]",[{return,list},{parts,4}]).
gives the same result as the default, which is to be viewed as "an
infinite number of parts".
Specifying [;;4m0[0m as the number of parts gives the same effect as
option [;;4mtrim[0m. If subexpressions are captured, empty
subexpressions matched at the end are also stripped from the
result if [;;4mtrim[0m or [;;4m{parts,0}[0m is specified.
The [;;4mtrim[0m behavior corresponds exactly to the Perl default. [;;4m[0m
[;;4m{parts,N}[0m, where N is a positive integer, corresponds exactly to
the Perl behavior with a positive numerical third parameter. The
default behavior of [;;4msplit/3[0m corresponds to the Perl behavior
when a negative integer is specified as the third parameter for
the Perl routine.
Summary of options not previously described for function [;;4mrun/3[0m:
• [;;4m{return,ReturnType}[0m - Specifies how the parts of the
original string are presented in the result list. Valid
types:
○ [;;4miodata[0m - The variant of [;;4miodata/0[0m that gives the
least copying of data with the current implementation
(often a binary, but do not depend on it).
○ [;;4mbinary[0m - All parts returned as binaries.
○ [;;4mlist[0m - All parts returned as lists of characters
("strings").
• [;;4mgroup[0m - Groups together the part of the string with the
parts of the string matching the subexpressions of the
regular expression.
The return value from the function is in this case a [;;4mlist/0[0m
of [;;4mlist/0[0ms. Each sublist begins with the string picked out
of the subject string, followed by the parts matching each
of the subexpressions in order of occurrence in the regular
expression.
• [;;4m{parts,N}[0m - Specifies the number of parts the subject
string is to be split into.
The number of parts is to be a positive integer for a
specific maximum number of parts, and [;;4minfinity[0m for the
maximum number of parts possible (the default). Specifying [;;4m[0m
[;;4m{parts,0}[0m gives as many parts as possible disregarding
empty parts at the end, the same as specifying [;;4mtrim[0m.
• [;;4mtrim[0m - Specifies that empty parts at the end of the result
list are to be disregarded. The same as specifying [;;4m[0m
[;;4m{parts,0}[0m. This corresponds to the default behavior of the [;;4m[0m
[;;4msplit[0m built-in function in Perl.
|