1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd">
<html>
<!-- Created on October, 16 2024 by texi2html 1.78a -->
<!--
Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
Karl Berry <karl@freefriends.org>
Olaf Bachmann <obachman@mathematik.uni-kl.de>
and many others.
Maintained by: Many creative people.
Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
<title>GNU libunistring: 2. Conventions</title>
<meta name="description" content="GNU libunistring: 2. Conventions">
<meta name="keywords" content="GNU libunistring: 2. Conventions">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 1.78a">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
pre.display {font-family: serif}
pre.format {font-family: serif}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: serif; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: serif; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.roman {font-family:serif; font-weight:normal;}
span.sansserif {font-family:sans-serif; font-weight:normal;}
ul.toc {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="libunistring_1.html#SEC1" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="libunistring_3.html#SEC9" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_21.html#SEC94" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<hr size="2">
<a name="Conventions"></a>
<a name="SEC8"></a>
<h1 class="chapter"> <a href="libunistring_toc.html#TOC8">2. Conventions</a> </h1>
<p>This chapter explains conventions valid throughout the libunistring library.
</p>
<a name="IDX14"></a>
<p>Variables of type <code>char *</code> denote C strings in locale encoding.
See <a href="libunistring_1.html#SEC4">Locale encodings</a>.
</p>
<p>Variables of type <code>uint8_t *</code> denote UTF-8 strings. Their units
are bytes.
</p>
<p>Variables of type <code>uint16_t *</code> denote UTF-16 strings, without byte
order mark. Their units are 2-byte words.
</p>
<p>Variables of type <code>uint32_t *</code> denote UTF-32 strings, without byte
order mark. Their units are 4-byte words.
</p>
<p>Argument pairs <code>(<var>s</var>, <var>n</var>)</code> denote a string
<code><var>s</var>[0..<var>n</var>-1]</code> with exactly <var>n</var> units.<a name="DOCF1" href="libunistring_fot.html#FOOT1">(1)</a>
</p>
<p>All functions with prefix ‘<samp>ulc_</samp>’ operate on C strings in locale
encoding.
</p>
<p>All functions with prefix ‘<samp>u8_</samp>’ operate on UTF-8 strings.
</p>
<p>All functions with prefix ‘<samp>u16_</samp>’ operate on UTF-16 strings.
</p>
<p>All functions with prefix ‘<samp>u32_</samp>’ operate on UTF-32 strings.
</p>
<p>For every function with prefix ‘<samp>u8_</samp>’, operating on UTF-8 strings,
there is also a corresponding function with prefix ‘<samp>u16_</samp>’,
operating on UTF-16 strings, and a corresponding function with prefix
‘<samp>u32_</samp>’, operating on UTF-32 strings. Their description is
analogous; in this documentation we describe only the function that
operates on UTF-8 strings, for brevity.
</p>
<p>A declaration with a variable <var>n</var> denotes the three concrete
declarations with <var>n</var> = 8, <var>n</var> = 16, <var>n</var> = 32.
</p>
<p>All parameters starting with ‘<samp>str</samp>’ and the parameters of
functions starting with <code>u8_str</code>/<code>u16_str</code>/<code>u32_str</code>
denote a NUL terminated string.
</p>
<a name="IDX15"></a>
<p>Error values are always returned through the <code>errno</code> variable,
usually with a return value that indicates the presence of an error
(NULL for functions that return an pointer, or -1 for functions that
return an <code>int</code>).
</p>
<p>Functions returning a string result take a
<code>(<var>resultbuf</var>, <var>lengthp</var>)</code>
argument pair. If <var>resultbuf</var> is not NULL and the result fits
into <code>*<var>lengthp</var></code> units, it is put in <var>resultbuf</var>, and
<var>resultbuf</var> is returned. Otherwise, a freshly allocated string
is returned. In both cases, <code>*<var>lengthp</var></code> is set to the
length (number of units) of the returned string. In case of error,
NULL is returned and <code>errno</code> is set.
</p>
<p>To invoke such a function:
</p><ul>
<li>
First ask yourself whether you want to accept the overhead of a
<code>malloc</code> invocation even for a small-sized result.
If yes, pass NULL as <var>resultbuf</var>.
If no, allocate an array of units on the stack, typically between 50 and
4000 bytes large; pass this array as <var>resultbuf</var>; and initialize
<code>*<var>lengthp</var></code> to the number of units of this array.
</li><li>
Upon return from such a function, look at the return value:
NULL means an error; look at the value of <code>errno</code> in this case.
Otherwise, the return value is the result, with <code>*<var>lengthp</var></code>
units. Note that the function has <em>not</em> added an extra NUL
character at the end.
</li><li>
Finally, do memory management. You know that the result was
<code>malloc</code>-allocated if it is <code>!= NULL</code> and
<code>!= <var>resultbuf</var></code>.
</li></ul>
<hr size="6">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="libunistring_1.html#SEC1" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="libunistring_3.html#SEC9" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_21.html#SEC94" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
<font size="-1">
This document was generated by <em>Bruno Haible</em> on <em>October, 16 2024</em> using <a href="https://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>.
</font>
<br>
</p>
</body>
</html>
|