File: unicode_line_break.3

package info (click to toggle)
courier-unicode 2.3.2-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,480 kB
  • sloc: ansic: 81,752; sh: 4,205; cpp: 2,561; perl: 1,003; makefile: 663
file content (223 lines) | stat: -rw-r--r-- 8,231 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
'\" t
.\"     Title: unicode_line_break
.\"    Author: Sam Varshavchik
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
.\"      Date: 05/18/2024
.\"    Manual: Courier Unicode Library
.\"    Source: Courier Unicode Library
.\"  Language: English
.\"
.TH "UNICODE_LINE_BREAK" "3" "05/18/2024" "Courier Unicode Library" "Courier Unicode Library"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
unicode_line_break, unicode_lb_init, unicode_lb_set_opts, unicode_lb_next, unicode_lb_next_cnt, unicode_lb_end, unicode_lbc_init, unicode_lbc_set_opts, unicode_lbc_next, unicode_lbc_next_cnt, unicode_lbc_end \- calculate mandatory or allowed line breaks
.SH "SYNOPSIS"
.sp
.ft B
.nf
#include <courier\-unicode\&.h>
.fi
.ft
.HP \w'unicode_lb_info_t\ unicode_lb_init('u
.BI "unicode_lb_info_t unicode_lb_init(int\ (*" "cb_func" ")(int,\ void\ *), void\ *" "cb_arg" ");"
.HP \w'void\ unicode_lb_set_opts('u
.BI "void unicode_lb_set_opts(unicode_lb_info_t\ " "lb" ", int\ " "opts" ");"
.HP \w'int\ unicode_lb_next('u
.BI "int unicode_lb_next(unicode_lb_info_t\ " "lb" ", char32_t\ " "c" ");"
.HP \w'int\ unicode_lb_next_cnt('u
.BI "int unicode_lb_next_cnt(unicode_lb_info_t\ " "lb" ", const\ char32_t\ *" "cptr" ", size_t\ " "cnt" ");"
.HP \w'int\ unicode_lb_end('u
.BI "int unicode_lb_end(unicode_lb_info_t\ " "lb" ");"
.HP 8
.BI "unicode_lbc_info_t unicode_lbc_init(int\ (*" "cb_func" ")(int,\ char32_t,\ void\ *), void\ *" "cb_arg" ");"
.HP \w'void\ unicode_lbc_set_opts('u
.BI "void unicode_lbc_set_opts(unicode_lbc_info_t\ " "lb" ", int\ " "opts" ");"
.HP \w'int\ unicode_lbc_next('u
.BI "int unicode_lbc_next(unicode_lb_info_t\ " "lb" ", char32_t\ " "c" ");"
.HP \w'int\ unicode_lbc_next_cnt('u
.BI "int unicode_lbc_next_cnt(unicode_lb_info_t\ " "lb" ", const\ char32_t\ *" "cptr" ", size_t\ " "cnt" ");"
.HP \w'int\ unicode_lbc_end('u
.BI "int unicode_lbc_end(unicode_lb_info_t\ " "lb" ");"
.SH "DESCRIPTION"
.PP
These functions implement the unicode line breaking algorithm\&. Invoke
\fBunicode_lb_init\fR() to initialize the line breaking algorithm\&. The first parameter is a callback function\&. The second parameter is an opaque pointer\&. The callback function gets invoked with two parameters\&. The first parameter is one of three values:
UNICODE_LB_MANDATORY,
UNICODE_LB_NONE, or
UNICODE_LB_ALLOWED, as described below\&. The second parameter is the opaque pointer that was passed to
\fBunicode_lb_init\fR(); the opaque pointer is not subject to any further interpretation by these functions\&.
.PP
\fBunicode_lb_init\fR() returns an opaque handle\&. Repeated invocations of
\fBunicode_lb_next\fR(), passing the handle and one unicode character at a time, defines a sequence of unicode characters over which the line breaking algorithm calculation takes place\&.
\fBunicode_lb_next_cnt\fR() is a shortcut for invoking
\fBunicode_lb_next\fR() repeatedly over an array
cptr
containing
cnt
unicode characters\&.
.PP
\fBunicode_lb_end\fR() denotes the end of the unicode character sequence\&. After the call to
\fBunicode_lb_end\fR() the line breaking
unicode_lb_info_t
handle is no longer valid\&.
.PP
Between the call to
\fBunicode_lb_init\fR() and
\fBunicode_lb_end\fR(), the callback function gets invoked exactly once for each unicode character given to
\fBunicode_lb_next\fR() or
\fBunicode_lb_next_cnt\fR()\&. Usually each call to
\fBunicode_lb_next\fR() results in the callback function getting invoked immediately, but it does not have to be\&. It\*(Aqs possible that a call to
\fBunicode_lb_next\fR() returns without invoking the callback function, and some subsequent call to
\fBunicode_lb_next\fR() (or
\fBunicode_lb_end\fR()) invokes the callback function more than once, to catch up\&. The contract is that before
\fBunicode_lb_end\fR() returns, the callback function gets invoked the exact number of times as the number of characters in the unicode sequence defined by the intervening calls to
\fBunicode_lb_next\fR() and
\fBunicode_lb_next_cnt\fR(), unless an error occurs\&.
.PP
Each call to the callback function reports the calculated line breaking status of the corresponding character in the unicode character sequence:
.PP
UNICODE_LB_MANDATORY
.RS 4
A line break is MANDATORY
\fIbefore\fR
the corresponding character\&.
.RE
.PP
UNICODE_LB_NONE
.RS 4
A line break is PROHIBITED
\fIbefore\fR
the corresponding character\&.
.RE
.PP
UNICODE_LB_ALLOWED
.RS 4
A line break is OPTIONAL
\fIbefore\fR
the corresponding character\&.
.RE
.PP
The callback function should return 0\&. A non\-zero value indicates to the line breaking algorithm that an error has occurred\&.
\fBunicode_lb_next\fR() and
\fBunicode_lb_next_cnt\fR() return zero either if they never invoked the callback function, or if each call to the callback function returned zero\&. A non zero return from the callback function results in
\fBunicode_lb_next\fR() and
\fBunicode_lb_next_cnt\fR() immediately returning the same value\&.
.PP
\fBunicode_lb_end\fR() must be invoked to destroy the line breaking handle even if
\fBunicode_lb_next\fR() and
\fBunicode_lb_next_cnt\fR() returned an error indication\&. It\*(Aqs also possible that, under normal circumstances,
\fBunicode_lb_end\fR() invokes the callback function one or more times\&. The return value from
\fBunicode_lb_end\fR() has the same meaning as from
\fBunicode_lb_next\fR() and
\fBunicode_lb_next_cnt\fR(); however in all cases after
\fBunicode_lb_end\fR() returns the line breaking handle is no longer valid\&.
.SS "Alternative callback function"
.PP
\fBunicode_lbc_init\fR(),
\fBunicode_lbc_next\fR(),
\fBunicode_lbc_next_cnt\fR(),
\fBunicode_lbc_end\fR() are alternative functions that implement the same algorithm\&. The only difference is that the callback function receives an extra parameter, the unicode character value to which the line breaking status applies to, passed through from the input unicode character sequence\&.
.SS "Options"
.PP
\fBunicode_lb_set_opts\fR() and
\fBunicode_lbc_set_opts\fR() enable non\-default options for the line breaking algorithm\&. These functions must be called immediately after
\fBunicode_lb_init\fR() or
\fBunicode_lbc_init\fR(), and before any other function\&.
\fIopts\fR
is a bitmask that can contain the following values:
.PP
UNICODE_LB_OPT_PRBREAK
.RS 4
Enables a modified
LB24
rule\&. This prevents plus signs, as in
\(lqC++\(rq
from breaking\&. This flag adds the following rules to the LB24 rule:
.sp
.if n \{\
.RS 4
.\}
.nf
			PR x PR

			AL x PR

		        ID x PR
.fi
.if n \{\
.RE
.\}
.RE
.PP
UNICODE_LB_OPT_SYBREAK
.RS 4
Tailored breaking rules for the
\(lq/\(rq
character\&. This prevents breaking after the
\(lq/\(rq
character (think URLs); including an exception to the
\(lqx SY\(rq
rule in
LB13\&. This flag adds the following rules to the LB24 rule:
.sp
.if n \{\
.RS 4
.\}
.nf
			SY x EX

			SY x AL

			SY x ID

		        SP \(di SY, which takes precedence over "x SY"\&.
.fi
.if n \{\
.RE
.\}
.RE
.PP
UNICODE_LB_OPT_DASHWJ
.RS 4
This flag reclassifies
U+2013
and
U+2014
as class
WJ, prohibiting breaks before and after the m\-dash and the n\-dash unicode characters\&.
.RE
.SH "SEE ALSO"
.PP
\fBcourier-unicode\fR(7),
\fBunicode::linebreak\fR(3),
\m[blue]\fBTR\-14\fR\m[]\&\s-2\u[1]\d\s+2
.SH "AUTHOR"
.PP
\fBSam Varshavchik\fR
.RS 4
Author
.RE
.SH "NOTES"
.IP " 1." 4
TR-14
.RS 4
\%https://www.unicode.org/reports/tr14/tr14-51.html
.RE