File: lazunicode.xml

package info (click to toggle)
lazarus 4.0%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 275,760 kB
  • sloc: pascal: 2,341,904; xml: 509,420; makefile: 348,726; cpp: 93,608; sh: 3,387; java: 609; perl: 297; sql: 222; ansic: 137
file content (481 lines) | stat: -rw-r--r-- 16,266 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
<?xml version="1.0" encoding="UTF-8"?>
<!--

Documentation for LCL (Lazarus Component Library) and LazUtils (Lazarus 
Utilities) are published under the Creative Commons Attribution-ShareAlike 4.0 
International public license.

https://creativecommons.org/licenses/by-sa/4.0/legalcode.txt
https://gitlab.com/freepascal.org/lazarus/lazarus/-/blob/main/docs/cc-by-sa-4-0.txt

Copyright (c) 1997-2025, by the Lazarus Development Team.

-->
<fpdoc-descriptions>
<package name="lazutils">
<!--
====================================================================
LazUnicode
====================================================================
-->
<module name="LazUnicode">
<short>
Provides encoding-agnostic Unicode string manipulation functions and an 
enumerator.
</short>
<descr>
<p>
<file>lazunicode.pas</file> provides encoding-agnostic Unicode string 
manipulation functions and an enumerator. It works transparently with UTF-8 
and UTF-16 encodings, and allows one codebase to work for:
</p>
<ol>
<li>Lazarus using its default UTF-8 encoding</li>
<li>
Future FPC and Lazarus versions with Delphi compatible UTF-16 encoding
</li>
<li>
Delphi compatibility where String is defined as UnicodeString
</li>
</ol>
<remark>
Behavior of helper functions are altered using the <var>{$ModeSwitch 
UnicodeStrings}</var> directive; the correct routines for handling UTF-8 or 
UTF-16 are called based on the mode switch value.
</remark>
<p>
<file>lazunicode.pas</file> is part of the <file>LazUtils</file> package.
</p>
</descr>

<!-- unresolved externals -->
<element name="Classes"/>
<element name="SysUtils"/>
<element name="character"/>
<element name="LazUTF16"/>
<element name="LazUTF8"/>

<!-- function Visibility: default -->
<element name="CodePointCopy">
<short>
Copies the specified number of codepoints starting at a character position.
</short>
<descr>
<p>
Copies the number of codepoints in <var>CharCount</var> from <var>s</var>, 
starting at the character position in <var>StartCharIndex</var>. For 
platforms that require UTF-16, <var>UTF16Copy</var> is called. For other 
platforms, <var>UTF8Copy</var> is called.
</p>
</descr>
<seealso>
<link id="#lazutils.lazutf16.UTF16Copy">UTF16Copy</link>
<link id="#lazutils.lazutf8.UTF8Copy">UTF8Copy</link>
</seealso>
</element>
<!-- function result Visibility: default -->
<element name="CodePointCopy.Result">
<short>Values copied from the string.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointCopy.s">
<short>UTF-encoded string values.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointCopy.StartCharIndex">
<short>Initial character position.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointCopy.CharCount">
<short>Number of characters needed in the copy operation.</short>
</element>

<!-- function Visibility: default -->
<element name="CodePointLength">
<short>
Gets the number of codepoints in the specified string.
</short>
<descr>
Gets the number of codepoints in the specified string. For platforms that 
require UTF-16, UTF16Length is called to get the return value for the 
function. For other platforms, UTF8LengthFast is called to get the number of 
codepoints.
</descr>
<seealso></seealso>
</element>
<!-- function result Visibility: default -->
<element name="CodePointLength.Result">
<short>Number of codepoints in the string.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointLength.s">
<short>UTF-encoded values examined in the function.</short>
</element>

<!-- function Visibility: default -->
<element name="CodePointPos">
<short>
Gets the position where the search value is found in a string.
</short>
<descr>
<p>
Gets the position in SearchInText where SearchForText is found. StartPos 
indicates the initial character position (codepoint) in SearchInText used for 
the comparison. The default value is 1.
</p>
<p>
The return value contains the character position (codepoint) where the search 
value was found. The return value is 0 (zero) if SearchForText is not found 
in the string. For platforms that require UTF-16, UTF16Pos is called to get 
the return value. For other platforms, UTF8Pos is called to get the character 
position (codepoint).
</p>
</descr>
<errors></errors>
<seealso></seealso>
</element>
<!-- function result Visibility: default -->
<element name="CodePointPos.Result">
<short>
Character position (codepoint) where the search value was found in the string.
</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointPos.SearchForText">
<short>Values to locate in the string.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointPos.SearchInText">
<short>String to search for the specified values.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointPos.StartPos">
<short>Initial character position (codepoint) used in the comparison.</short>
</element>

<!-- function Visibility: default -->
<element name="CodePointSize">
<short>
Gets the number of bytes needed for a CodePoint in the specified value.
</short>
<descr>
Gets the number of bytes needed for the CodePoint specified in p. For 
platforms that require UTF-16, TCharacter.IsHighSurrogate is called to get 
the return value. For other platforms, UTF8CodepointSizeFast is called to get 
the number of bytes for the codepoint. The return value is 1 or 2 for 
UTF-16-enabled platforms, or in the range 1..4 for UTF-8-enabled platforms. 
The return value can be 0 (zero) if p contains an empty string ('') or a 
malformed codepoint.
</descr>
<seealso></seealso>
</element>
<!-- function result Visibility: default -->
<element name="CodePointSize.Result">
<short>Number of bytes required for a codepoint.</short>
</element>
<!-- argument Visibility: default -->
<element name="CodePointSize.p">
<short>String with the codepoint to examine in the function.</short>
</element>

<!-- function Visibility: default -->
<element name="IsCombining">
<short>
Determines if the specified value is a combining codepoint.
</short>
<descr>
Determines if the specified value is a combining codepoint. Please note, 
there are many more rules for combining codepoints.The diacritical marks 
handled in the function are only a subset of the possible Unicode values. For 
platforms that require UTF-16, UTF16IsCombining is called to get the return 
value for the specified codepoint. For other platforms, UTF8IsCombining is 
called to examine the codepoint.
</descr>
<seealso></seealso>
</element>
<!-- function result Visibility: default -->
<element name="IsCombining.Result">
<short>
<b>True</b> when the codepoint represents a Unicode combining character.
</short>
</element>
<!-- argument Visibility: default -->
<element name="IsCombining.AChar">
<short>Codepoint to examine in the function.</short>
</element>

<!-- function Visibility: default -->
<element name="UnicodeToWinCP">
<short>
Converts the specified value to the Windows system codepage.
</short>
<descr>
Converts the specified value to the Windows system codepage. The Unicode 
encoding used in s depends on the modeswitch value. For platforms that 
require UTF-16, UTF16ToUTF8 and UTF8ToWinCP are called to get the return 
value for the function, except when String is defined as UnicodeString. No 
conversion is required in that situation. For other platforms, UTF8ToWinCP is 
called to get the return value.
</descr>
<errors></errors>
<seealso></seealso>
</element>
<!-- function result Visibility: default -->
<element name="UnicodeToWinCP.Result">
<short>Values after conversion to the Windows code page.</short>
</element>
<!-- argument Visibility: default -->
<element name="UnicodeToWinCP.s">
<short>Unicode values to convert in the function.</short>
</element>

<!-- function Visibility: default -->
<element name="WinCPToUnicode">
<short>
Converts the specified string to Unicode.
</short>
<descr>
Converts the specified value from the Windows system codepage to Unicode. The 
Unicode encoding used depends on the modeswitch value. For platforms that 
require UTF-16, WinCPToUTF8 and UTF8ToUTF16 are called to get the return 
value for the function. Except when String is defined as UnicodeString. No 
conversion is required in that situation. For other platforms, WinCPToUTF8 is 
called to get the return value.
</descr>
<errors></errors>
<seealso></seealso>
</element>
<!-- function result Visibility: default -->
<element name="WinCPToUnicode.Result">
<short>Unicode values for the specified string.</short>
</element>
<!-- argument Visibility: default -->
<element name="WinCPToUnicode.s">
<short>String with Windows code page values.</short>
</element>

<element name="StringOfCodePoint">
<short>
Creates a string with the specified number of codepoints.
</short>
<descr>
Creates a string with the specified number of codepoints. Like StringOfChar. 
For platforms that require UTF-16, the values in ACodePoint are concatenated 
together until the number of codepoints in N have been created. For other 
platforms, Utf8StringOfChar is called to get the return value for the 
function.
</descr>
<seealso></seealso>
</element>
<element name="StringOfCodePoint.Result">
<short>String with the specified number of codepoints.</short>
</element>
<element name="StringOfCodePoint.ACodePoint">
<short>Codepoint to use when creating the string.</short>
</element>
<element name="StringOfCodePoint.N">
<short>Number of codepoints required in the string.</short>
</element>

<!-- class Visibility: default -->
<element name="TUnicodeEnumeratorBase">
<short>Base class for a Unicode character enumerator.</short>
<descr>
Base class for a Unicode character enumerator.
</descr>
<errors></errors>
<seealso></seealso>
</element>

<!-- variable Visibility: private -->
<element name="TUnicodeEnumeratorBase.fSrcPos"/>
<element name="TUnicodeEnumeratorBase.fEndPos"/>
<element name="TUnicodeEnumeratorBase.fCurOne"/>
<element name="TUnicodeEnumeratorBase.fCurTwo"/>
<element name="TUnicodeEnumeratorBase.fCurThree"/>
<element name="TUnicodeEnumeratorBase.fCurFour"/>
<element name="TUnicodeEnumeratorBase.fCurrent"/>
<element name="TUnicodeEnumeratorBase.fCurrentCodeUnitCount"/>

<element name="TUnicodeEnumeratorBase.UpdateCurrent">
<short>
Copies byte values for the Current character (codepoint).
</short>
<descr>
Copies byte values used in Current for the character (codepoint) when 
MoveNext is called to go to the next character. aCount contains the number of 
byte values needed for the Unicode codepoint. UpdateCurrent increments the 
internal pointer used to access values in the enumerator by the number of 
bytes in aCount.
</descr>
<errors>
<p>
Raises an assertion error if the number of bytes in aCount is 0 (zero). 
Raised with the message 'TUnicodeEnumeratorBase.UpdateCurrent: aCount=0'.
</p>
<p>
Raises an assertion error if the length of bytes copied to Current is 
different that the value in aCount. Raised with the message 
'TUnicodeEnumeratorBase.UpdateCurrent: Length(fCurrent)&lt;&gt;aCount.')'.
</p>
</errors>
</element>
<element name="TUnicodeEnumeratorBase.UpdateCurrent.aCount">
<short>Number of bytes needed for the codepoint.</short>
</element>

<!-- constructor Visibility: public -->
<element name="TUnicodeEnumeratorBase.Create">
<short>
Constructor for the class instance.
</short>
<descr>
Create initializes internal member variable used to access byte values for 
Unicode codepoints. A is the string with codepoints traversed using the 
enumerator.
</descr>
<seealso></seealso>
</element>
<!-- argument Visibility: default -->
<element name="TUnicodeEnumeratorBase.Create.A">
<short>Unicode string for the enumerator.</short>
</element>

<!-- property Visibility: public -->
<element name="TUnicodeEnumeratorBase.Current">
<short>
Byte values for the current codepoint in the enumerator.
</short>
<descr>
Current is a read-only String property which provides access to the byte 
values for the current codepoint in the enumerator. Current is updated in 
UpdateCurrent when the MoveNext method is called.
</descr>
<seealso></seealso>
</element>

<!-- property Visibility: public -->
<element name="TUnicodeEnumeratorBase.CurrentCodeUnitCount">
<short>
Number of bytes in the Current codepoint.
</short>
<descr>
CurrentCodeUnitCount is a read-only Integer property which contains the 
number of bytes needed for the codepoint in Current. CurrentCodeUnitCount is 
updated in UpdateCurrent when MoveNext is called.
</descr>
<seealso></seealso>
</element>

<!-- class Visibility: default -->
<element name="TCodePointEnumerator">
<short>
Base class for a Unicode codepoint enumerator.
</short>
<descr>
Base class for a Unicode codepoint enumerator. TCodePointEnumerator allows 
traversal of Unicode codepoints. Uses UTF-8 or UTF-16 encodings depending on 
value in <var>$ModeSwitch</var>. Extends the ancestor class to provide 
navigation in the enumerator using the MoveNext method.
</descr>
<seealso></seealso>
</element>

<!-- function Visibility: public -->
<element name="TCodePointEnumerator.MoveNext">
<short>
Provides navigation to the next codepoint in the enumerator.
</short>
<descr>
Provides navigation to the next Unicode codepoint in the enumerator. The 
return value contains <b>True</b> when more characters (codepoints) are 
available to the enumerator. UpdateCurrent is called using the value from 
CodePointSize to store the value for the Current property.
</descr>
<seealso></seealso>
</element>
<!-- function result Visibility: public -->
<element name="TCodePointEnumerator.MoveNext.Result">
<short><b>True</b> when more characters (codepoints) are available.</short>
</element>

<!-- class Visibility: default -->
<element name="TUnicodeCharacterEnumerator">
<short>
Implements an enumerator for Unicode codepoints.
</short>
<descr>
Implements an enumerator for Unicode codepoints. TUnicodeCharacterEnumerator 
allows traversal of characters (codepoints) in a Unicode-encoded string. 
Values use either UTF-16 or UTF-8 encoding depending on the value for 
<var>$ModeSwitch</var>. An overridden MoveNext method is provided to handle 
combining diacritical marks in the Unicode codepoints.
</descr>
<seealso></seealso>
</element>

<!-- variable Visibility: private -->
<element name="TUnicodeCharacterEnumerator.fCurrentCodePointCount"/>

<!-- property Visibility: public -->
<element name="TUnicodeCharacterEnumerator.CurrentCodePointCount">
<short>
Number of bytes used for the Current codepoint.
</short>
<descr>
CurrentCodePointCount is a read-only Integer property that indicates the 
number of bytes used for the Current codepoint. CurrentCodePointCount is 
updated in the MoveNext method, and includes any combining diacritical marks 
found in the codepoints.
</descr>
<seealso></seealso>
</element>

<!-- function Visibility: public -->
<element name="TUnicodeCharacterEnumerator.MoveNext">
<short>
Adds support for combining diacritical marks when moving to the next 
codepoint.
</short>
<descr>
<p>
MoveNext is an overridden method which adds support for combining diacritical 
marks when moving to the next codepoint for the enumerator. The return value 
is <b>True</b> when more characters (codepoints) are available to the 
enumerator. MoveNext updates the value in CurrentCodeUnitCount, and includes 
combining diacritical marks in the byte count. MoveNext calls UpdateCurrent 
to store the value for the Current property.
</p>
<remark>
MoveNext does not call the inherited method.
</remark>
</descr>
<seealso></seealso>
</element>
<!-- function result Visibility: public -->
<element name="TUnicodeCharacterEnumerator.MoveNext.Result">
<short>
<b>True</b> when more characters (codepoints) are available to the enumerator.
</short>
</element>

<!-- operator Visibility: default -->
<element name="enumerator(string):tunicodecharacterenumerator">
<short>
Enumerator which combines diacritical marks.
</short>
<descr>
<p>
The enumerator operator enables For ... In loops. This enumerator combines 
diacritical marks in the String argument for the operator. It is used by 
default although there are more rules for combining codepoints. Diacritical 
marks cover rules for most western languages.
</p>
</descr>
</element>

</module>
<!-- LazUnicode -->

</package>
</fpdoc-descriptions>