File: unicode_wb_init.3

package info (click to toggle)
courier-unicode 2.4.0-4
links: PTS, VCS
area: main
in suites: forky, sid
size: 5,572 kB
sloc: ansic: 83,912; sh: 4,230; cpp: 2,596; perl: 1,023; makefile: 663
file content (139 lines) | stat: -rw-r--r-- 6,802 bytes
'\" t
.\"     Title: unicode_word_break
.\"    Author: Sam Varshavchik
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
.\"      Date: 08/26/2025
.\"    Manual: Courier Unicode Library
.\"    Source: Courier Unicode Library
.\"  Language: English
.\"
.TH "UNICODE_WORD_BREAK" "3" "08/26/2025" "Courier Unicode Library" "Courier Unicode Library"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
unicode_wb_init, unicode_wb_next, unicode_wb_next_cnt, unicode_wb_end, unicode_wbscan_init, unicode_wbscan_next, unicode_wbscan_end, unicode_word_break \- calculate word breaks
.SH "SYNOPSIS"
.sp
.ft B
.nf
#include <courier\-unicode\&.h>
.fi
.ft
.HP \w'unicode_wb_info_t\ unicode_wb_init('u
.BI "unicode_wb_info_t unicode_wb_init(int\ (*" "cb_func" ")(int,\ void\ *), void\ *" "cb_arg" ");"
.HP \w'int\ unicode_wb_next('u
.BI "int unicode_wb_next(unicode_wb_info_t\ " "wb" ", char32_t\ " "c" ");"
.HP \w'int\ unicode_wb_next_cnt('u
.BI "int unicode_wb_next_cnt(unicode_wb_info_t\ " "wb" ", const\ char32_t\ *" "cptr" ", size_t\ " "cnt" ");"
.HP \w'int\ unicode_wb_end('u
.BI "int unicode_wb_end(unicode_wb_info_t\ " "wb" ");"
.HP \w'unicode_wbscan_info_t\ unicode_wbscan_init('u
.BI "unicode_wbscan_info_t unicode_wbscan_init(void);"
.HP \w'int\ unicode_wbscan_next('u
.BI "int unicode_wbscan_next(unicode_wbscan_info_t\ " "wbs" ", char32_t\ " "c" ");"
.HP \w'size_t\ unicode_wbscan_end('u
.BI "size_t unicode_wbscan_end(unicode_wbscan_info_t\ " "wbs" ");"
.SH "DESCRIPTION"
.PP
These functions implement the unicode word breaking algorithm\&. Invoke
\fBunicode_wb_init\fR() to initialize the word breaking algorithm\&. The first parameter is a callback function\&. The second parameter is an opaque pointer\&. The callback function gets invoked with two parameters\&. The second parameter is the opaque pointer that was given to
\fBunicode_wb_init\fR(); and the opaque pointer is not subject to any further interpretation by these functions\&.
.PP
\fBunicode_wb_init\fR() returns an opaque handle\&. Repeated invocations of
\fBunicode_wb_next\fR(), passing the handle, and one unicode character defines a sequence of unicode characters over which the word breaking algorithm calculation takes place\&.
\fBunicode_wb_next_cnt\fR() is a shortcut for invoking
\fBunicode_wb_next\fR() repeatedly over an array
cptr
containing
cnt
unicode characters\&.
.PP
\fBunicode_wb_end\fR() denotes the end of the unicode character sequence\&. After the call to
\fBunicode_wb_end\fR() the word breaking
unicode_wb_info_t
handle is no longer valid\&.
.PP
Between the call to
\fBunicode_wb_init\fR() and
\fBunicode_wb_end\fR(), the callback function gets invoked exactly once for each unicode character given to
\fBunicode_wb_next\fR() or
\fBunicode_wb_next_cnt\fR()\&. Usually each call to
\fBunicode_wb_next\fR() results in the callback function getting invoked immediately, but it does not have to be\&. It\*(Aqs possible that a call to
\fBunicode_wb_next\fR() returns without invoking the callback function, and some subsequent call to
\fBunicode_wb_next\fR() (or
\fBunicode_wb_end\fR()) invokes the callback function more than once, to catch things up\&. The contract is that before
\fBunicode_wb_end\fR() returns, the callback function gets invoked the exact number of times as the number of characters in the unicode sequence defined by the intervening calls to
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR(), unless an error occurs\&.
.PP
Each call to the callback function reports the calculated wordbreaking status of the corresponding character in the unicode character sequence\&. If the parameter to the callback function is non zero, a word break is permitted
\fIbefore\fR
the corresponding character\&. A zero value indicates that a word break is prohibited
\fIbefore\fR
the corresponding character\&.
.PP
The callback function should return 0\&. A non\-zero value indicates to the word breaking algorithm that an error has occurred\&.
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR() return zero either if they never invoked the callback function, or if each call to the callback function returned zero\&. A non zero return from the callback function results in
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR() immediately returning the same value\&.
.PP
\fBunicode_wb_end\fR() must be invoked to destroy the word breaking handle even if
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR() returned an error indication\&. It\*(Aqs also possible that, under normal circumstances,
\fBunicode_wb_end\fR() invokes the callback function one or more times\&. The return value from
\fBunicode_wb_end\fR() has the same meaning as from
\fBunicode_wb_next\fR() and
\fBunicode_wb_next_cnt\fR(); however in all cases after
\fBunicode_wb_end\fR() returns the line breaking handle is no longer valid\&.
.SS "Word scan"
.PP
\fBunicode_wbscan_init\fR(),
\fBunicode_wbscan_next\fR() and
\fBunicode_wbscan_end\fR
scan for the next word boundary in a unicode character sequence\&.
\fBunicode_wbscan_init\fR() obtains a handle, then
\fBunicode_wbscan_next\fR() gets repeatedly invoked to define the unicode character sequence\&.
\fBunicode_wbscan_end\fR() deallocates the handle and returns the number of leading characters in the unicode character sequence up to the first word break\&.
.PP
A non\-0 return value from
\fBunicode_wbscan_next\fR() indicates that the word boundary is already known, and any further calls to
\fBunicode_wbscan_next\fR() will be ignored\&.
\fBunicode_wbscan_end\fR() must still be called, to obtain the unicode character count\&.
.SH "SEE ALSO"
.PP
\m[blue]\fBTR\-29\fR\m[]\&\s-2\u[1]\d\s+2,
\fBcourier-unicode\fR(7),
\fBunicode::wordbreak\fR(3),
\fBunicode_convert_tocase\fR(3),
\fBunicode_line_break\fR(3),
\fBunicode_grapheme_break\fR(3)\&.
.SH "AUTHOR"
.PP
\fBSam Varshavchik\fR
.RS 4
Author
.RE
.SH "NOTES"
.IP " 1." 4
TR-29
.RS 4
\%https://www.unicode.org/reports/tr29/tr29-45.html
.RE