File: stdlib_re_split_func.txt

package info (click to toggle)
erlang 1%3A27.3.4.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 225,000 kB
  • sloc: erlang: 1,658,966; ansic: 405,769; cpp: 177,850; xml: 82,435; makefile: 15,031; sh: 14,401; lisp: 9,812; java: 8,603; asm: 6,541; perl: 5,836; python: 5,484; sed: 72
file content (163 lines) | stat: -rw-r--r-- 6,320 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163

  split(Subject, RE)

  There is no documentation for split(Subject, RE, [])

  split(Subject, RE, Options)

  Splits the input into parts by finding tokens according to the
  regular expression supplied.

  The splitting is basically done by running a global regular
  expression match and dividing the initial string wherever a match
  occurs. The matching part of the string is removed from the
  output.

  As in run/3, an mp/0 compiled with option unicode requires 
  Subject to be a Unicode charlist(). If compilation is done
  implicitly and the unicode compilation option is specified to
  this function, both the regular expression and Subject are to be
  specified as valid Unicode charlist()s.

  The result is given as a list of "strings", the preferred data
  type specified in option return (default iodata).

  If subexpressions are specified in the regular expression, the
  matching subexpressions are returned in the resulting list as
  well. For example:

    re:split("Erlang","[ln]",[{return,list}]).

  gives

    ["Er","a","g"]

  while

    re:split("Erlang","([ln])",[{return,list}]).

  gives

    ["Er","l","a","n","g"]

  The text matching the subexpression (marked by the parentheses in
  the regular expression) is inserted in the result list where it
  was found. This means that concatenating the result of a split
  where the whole regular expression is a single subexpression (as
  in the last example) always results in the original string.

  As there is no matching subexpression for the last part in the
  example (the "g"), nothing is inserted after that. To make the
  group of strings and the parts matching the subexpressions more
  obvious, one can use option group, which groups together the
  part of the subject string with the parts matching the
  subexpressions when the string was split:

    re:split("Erlang","([ln])",[{return,list},group]).

  gives

    [["Er","l"],["a","n"],["g"]]

  Here the regular expression first matched the "l", causing "Er" to
  be the first part in the result. When the regular expression
  matched, the (only) subexpression was bound to the "l", so the "l"
  is inserted in the group together with "Er". The next match is of
  the "n", making "a" the next part to be returned. As the
  subexpression is bound to substring "n" in this case, the "n" is
  inserted into this group. The last group consists of the remaining
  string, as no more matches are found.

  By default, all parts of the string, including the empty strings,
  are returned from the function, for example:

    re:split("Erlang","[lg]",[{return,list}]).

  gives

    ["Er","an",[]]

  as the matching of the "g" in the end of the string leaves an
  empty rest, which is also returned. This behavior differs from the
  default behavior of the split function in Perl, where empty
  strings at the end are by default removed. To get the "trimming"
  default behavior of Perl, specify trim as an option:

    re:split("Erlang","[lg]",[{return,list},trim]).

  gives

    ["Er","an"]

  The "trim" option says; "give me as many parts as possible except
  the empty ones", which sometimes can be useful. You can also
  specify how many parts you want, by specifying {parts,N}:

    re:split("Erlang","[lg]",[{return,list},{parts,2}]).

  gives

    ["Er","ang"]

  Notice that the last part is "ang", not "an", as splitting was
  specified into two parts, and the splitting stops when enough
  parts are given, which is why the result differs from that of 
  trim.

  More than three parts are not possible with this indata, so

    re:split("Erlang","[lg]",[{return,list},{parts,4}]).

  gives the same result as the default, which is to be viewed as "an
  infinite number of parts".

  Specifying 0 as the number of parts gives the same effect as
  option trim. If subexpressions are captured, empty
  subexpressions matched at the end are also stripped from the
  result if trim or {parts,0} is specified.

  The trim behavior corresponds exactly to the Perl default. 
  {parts,N}, where N is a positive integer, corresponds exactly to
  the Perl behavior with a positive numerical third parameter. The
  default behavior of split/3 corresponds to the Perl behavior
  when a negative integer is specified as the third parameter for
  the Perl routine.

  Summary of options not previously described for function run/3:

   • {return,ReturnType} - Specifies how the parts of the
     original string are presented in the result list. Valid
     types:

      ○ iodata - The variant of iodata/0 that gives the
        least copying of data with the current implementation
        (often a binary, but do not depend on it).

      ○ binary - All parts returned as binaries.

      ○ list - All parts returned as lists of characters
        ("strings").

   • group - Groups together the part of the string with the
     parts of the string matching the subexpressions of the
     regular expression.

     The return value from the function is in this case a list/0
     of list/0s. Each sublist begins with the string picked out
     of the subject string, followed by the parts matching each
     of the subexpressions in order of occurrence in the regular
     expression.

   • {parts,N} - Specifies the number of parts the subject
     string is to be split into.

     The number of parts is to be a positive integer for a
     specific maximum number of parts, and infinity for the
     maximum number of parts possible (the default). Specifying 
     {parts,0} gives as many parts as possible disregarding
     empty parts at the end, the same as specifying trim.

   • trim - Specifies that empty parts at the end of the result
     list are to be disregarded. The same as specifying 
     {parts,0}. This corresponds to the default behavior of the 
     split built-in function in Perl.