File: unicode_wb_init.3

package info (click to toggle)
courier-unicode 2.4.0-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,572 kB
  • sloc: ansic: 83,912; sh: 4,230; cpp: 2,596; perl: 1,023; makefile: 663
file content (139 lines) | stat: -rw-r--r-- 6,802 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
'\" t
.\"     Title: unicode_word_break
.\"    Author: Sam Varshavchik
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
.\"      Date: 08/26/2025
.\"    Manual: Courier Unicode Library
.\"    Source: Courier Unicode Library
.\"  Language: English
.\"
.TH "UNICODE_WORD_BREAK" "3" "08/26/2025" "Courier Unicode Library" "Courier Unicode Library"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
unicode_wb_init, unicode_wb_next, unicode_wb_next_cnt, unicode_wb_end, unicode_wbscan_init, unicode_wbscan_next, unicode_wbscan_end, unicode_word_break \- calculate word breaks
.SH "SYNOPSIS"
.sp
.ft B
.nf
#include <courier\-unicode\&.h>
.fi
.ft
.HP \w'unicode_wb_info_t\ unicode_wb_init('u
.BI "unicode_wb_info_t unicode_wb_init(int\ (*" "cb_func" ")(int,\ void\ *), void\ *" "cb_arg" ");"
.HP \w'int\ unicode_wb_next('u
.BI "int unicode_wb_next(unicode_wb_info_t\ " "wb" ", char32_t\ " "c" ");"
.HP \w'int\ unicode_wb_next_cnt('u
.BI "int unicode_wb_next_cnt(unicode_wb_info_t\ " "wb" ", const\ char32_t\ *" "cptr" ", size_t\ " "cnt" ");"
.HP \w'int\ unicode_wb_end('u
.BI "int unicode_wb_end(unicode_wb_info_t\ " "wb" ");"
.HP \w'unicode_wbscan_info_t\ unicode_wbscan_init('u
.BI "unicode_wbscan_info_t unicode_wbscan_init(void);"
.HP \w'int\ unicode_wbscan_next('u
.BI "int unicode_wbscan_next(unicode_wbscan_info_t\ " "wbs" ", char32_t\ " "c" ");"
.HP \w'size_t\ unicode_wbscan_end('u
.BI "size_t unicode_wbscan_end(unicode_wbscan_info_t\ " "wbs" ");"
.SH "DESCRIPTION"
.PP
These functions implement the unicode word breaking algorithm\&. Invoke
\fBunicode_wb_init\fR() to initialize the word breaking algorithm\&. The first parameter is a callback function\&. The second parameter is an opaque pointer\&. The callback function gets invoked with two parameters\&. The second parameter is the opaque pointer that was given to
\fBunicode_wb_init\fR(); and the opaque pointer is not subject to any further interpretation by these functions\&.
.PP
\fBunicode_wb_init\fR() returns an opaque handle\&. Repeated invocations of
\fBunicode_wb_next\fR(), passing the handle, and one unicode character defines a sequence of unicode characters over which the word breaking algorithm calculation takes place\&.
\fBunicode_wb_next_cnt\fR() is a shortcut for invoking
\fBunicode_wb_next\fR() repeatedly over an array
cptr
containing
cnt
unicode characters\&.
.PP
\fBunicode_wb_end\fR() denotes the end of the unicode character sequence\&. After the call to
\fBunicode_wb_end\fR() the word breaking
unicode_wb_info_t
handle is no longer valid\&.
.PP
Between the call to
\fBunicode_wb_init\fR() and
\fBunicode_wb_end\fR(), the callback function gets invoked exactly once for each unicode character given to
\fBunicode_wb_next\fR() or
\fBunicode_wb_next_cnt\fR()\&. Usually each call to
\fBunicode_wb_next\fR() results in the callback function getting invoked immediately, but it does not have to be\&. It\*(Aqs possible that a call to
\fBunicode_wb_next\fR() returns without invoking the callback function, and some subsequent call to
\fBunicode_wb_next\fR() (or
\fBunicode_wb_end\fR()) invokes the callback function more than once, to catch things up\&. The contract is that before
\fBunicode_wb_end\fR() returns, the callback function gets invoked the exact number of times as the number of characters in the unicode sequence defined by the intervening calls to
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR(), unless an error occurs\&.
.PP
Each call to the callback function reports the calculated wordbreaking status of the corresponding character in the unicode character sequence\&. If the parameter to the callback function is non zero, a word break is permitted
\fIbefore\fR
the corresponding character\&. A zero value indicates that a word break is prohibited
\fIbefore\fR
the corresponding character\&.
.PP
The callback function should return 0\&. A non\-zero value indicates to the word breaking algorithm that an error has occurred\&.
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR() return zero either if they never invoked the callback function, or if each call to the callback function returned zero\&. A non zero return from the callback function results in
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR() immediately returning the same value\&.
.PP
\fBunicode_wb_end\fR() must be invoked to destroy the word breaking handle even if
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR() returned an error indication\&. It\*(Aqs also possible that, under normal circumstances,
\fBunicode_wb_end\fR() invokes the callback function one or more times\&. The return value from
\fBunicode_wb_end\fR() has the same meaning as from
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR(); however in all cases after
\fBunicode_wb_end\fR() returns the line breaking handle is no longer valid\&.
.SS "Word scan"
.PP
\fBunicode_wbscan_init\fR(),
\fBunicode_wbscan_next\fR() and
\fBunicode_wbscan_end\fR
scan for the next word boundary in a unicode character sequence\&.
\fBunicode_wbscan_init\fR() obtains a handle, then
\fBunicode_wbscan_next\fR() gets repeatedly invoked to define the unicode character sequence\&.
\fBunicode_wbscan_end\fR() deallocates the handle and returns the number of leading characters in the unicode character sequence up to the first word break\&.
.PP
A non\-0 return value from
\fBunicode_wbscan_next\fR() indicates that the word boundary is already known, and any further calls to
\fBunicode_wbscan_next\fR() will be ignored\&.
\fBunicode_wbscan_end\fR() must still be called, to obtain the unicode character count\&.
.SH "SEE ALSO"
.PP
\m[blue]\fBTR\-29\fR\m[]\&\s-2\u[1]\d\s+2,
\fBcourier-unicode\fR(7),
\fBunicode::wordbreak\fR(3),
\fBunicode_convert_tocase\fR(3),
\fBunicode_line_break\fR(3),
\fBunicode_grapheme_break\fR(3)\&.
.SH "AUTHOR"
.PP
\fBSam Varshavchik\fR
.RS 4
Author
.RE
.SH "NOTES"
.IP " 1." 4
TR-29
.RS 4
\%https://www.unicode.org/reports/tr29/tr29-45.html
.RE