1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257
|
<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision: 1.1 $ -->
<refentry xml:id="collator.setstrength" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
<refnamediv>
<refname>Collator::setStrength</refname>
<refname>collator_set_strength</refname>
<refpurpose>Set collation strength</refpurpose>
</refnamediv>
<refsect1 role="description">
&reftitle.description;
<para>
Object oriented style
</para>
<methodsynopsis>
<type>bool</type>
<methodname>Collator::setStrength</methodname>
<methodparam><type>integer</type><parameter>strength</parameter></methodparam>
</methodsynopsis>
<para>
Procedural style
</para>
<methodsynopsis>
<type>bool</type>
<methodname>collator_set_strength</methodname>
<methodparam><type>Collator</type><parameter>coll</parameter></methodparam>
<methodparam><type>integer</type><parameter>strength</parameter></methodparam>
</methodsynopsis>
<para>
The
<link xlink:href="&url.icu.home;">ICU</link>
Collation Service supports many levels of comparison (named "Levels", but
also known as "Strengths"). Having these categories enables ICU to sort
strings precisely according to local conventions. However, by allowing the
levels to be selectively employed, searching for a string in text can be
performed with various matching conditions.
</para>
<para>
<orderedlist>
<listitem>
<para>
<emphasis> Primary Level</emphasis>:
Typically, this is used to denote differences between base characters
(for example, "a" < "b"). It is the strongest difference. For
example, dictionaries are divided into different sections by base
character. This is also called the level1 strength.
</para>
</listitem>
<listitem>
<para>
<emphasis> Secondary Level</emphasis>:
Accents in the characters are considered secondary differences (for
example, "as" < "às" < "at"). Other differences between letters
can also be considered secondary differences, depending on the language.
A secondary difference is ignored when there is a primary difference
anywhere in the strings. This is also called the level2 strength.
<note>
<para>
Note: In some languages (such as Danish), certain accented letters are
considered to be separate base characters. In most languages, however,
an accented letter only has a secondary difference from the unaccented
version of that letter.
</para>
</note>
</para>
</listitem>
<listitem>
<para>
<emphasis> Tertiary Level</emphasis>:
Upper and lower case differences in characters are distinguished at
the tertiary level (for example, "ao" < "Ao" < "aò"). In addition,
a variant of a letter differs from the base form on the tertiary level
(such as "A" and " "). Another example is the difference between large
and small Kana. A tertiary difference is ignored when there is a primary
or secondary difference anywhere in the strings. This is also called the
level3 strength.
</para>
</listitem>
<listitem>
<para>
<emphasis> Quaternary Level</emphasis>:
When punctuation is ignored (see Ignoring Punctuations ) at level 13,
an additional level can be used to distinguish words with and without
punctuation (for example, "ab" < "a-b" < "aB"). This difference is
ignored when there is a primary, secondary or tertiary difference. This
is also known as the level4 strength. The quaternary level should only
be used if ignoring punctuation is required or when processing Japanese
text (see Hiragana processing).
</para>
</listitem>
<listitem>
<para>
<emphasis> Identical Level</emphasis>:
When all other levels are equal, the identical level is used as a
tiebreaker. The Unicode code point values of the NFD form of each string
are compared at this level, just in case there is no difference at
levels 14. For example, Hebrew cantillation marks are only distinguished
at this level. This level should be used sparingly, as only code point
values differences between two strings is an extremely rare occurrence.
Using this level substantially decreases the performance for both
incremental comparison and sort key generation (as well as increasing
the sort key length). It is also known as level 5 strength.
</para>
</listitem>
</orderedlist>
</para>
<para>
For example, people may choose to ignore accents or ignore accents and case
when searching for text. Almost all characters are distinguished by the
first three levels, and in most locales the default value is thus Tertiary.
However, if Alternate is set to be Shifted, then the Quaternary strength
can be used to break ties among whitespace, punctuation, and symbols that
would otherwise be ignored. If very fine distinctions among characters are
required, then the Identical strength can be used (for example, Identical
Strength distinguishes between the Mathematical Bold Small A and the
Mathematical Italic Small A.). However, using levels higher than Tertiary
the Identical strength result in significantly longer sort keys, and slower
string comparison performance for equal strings.
</para>
</refsect1>
<refsect1 role="parameters">
&reftitle.parameters;
<para>
<variablelist>
<varlistentry>
<term><parameter>coll</parameter></term>
<listitem>
<para>
<classname>Collator</classname> object.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>strength</parameter></term>
<listitem>
<para>Strength to set.</para>
<para>
Possible values are:
<itemizedlist>
<listitem>
<para>
<constant>Collator::PRIMARY</constant>
</para>
</listitem>
<listitem>
<para>
<constant>Collator::SECONDARY</constant>
</para>
</listitem>
<listitem>
<para>
<constant>Collator::TERTIARY</constant>
</para>
</listitem>
<listitem>
<para>
<constant>Collator::QUATERNARY</constant>
</para>
</listitem>
<listitem>
<para>
<constant>Collator::IDENTICAL</constant>
</para>
</listitem>
<listitem>
<para>
<constant>Collator::DEFAULT</constant>
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect1>
<refsect1 role="returnvalues">
&reftitle.returnvalues;
<para>
&return.success;
</para>
</refsect1>
<refsect1 role="examples">
&reftitle.examples;
<para>
<example>
<title><function>collator_set_strength</function> example</title>
<programlisting role="php">
<![CDATA[
<?php
$arr = array( 'aò', 'Ao', 'ao' );
$coll = collator_create( 'en_US' );
// Sort array using default strength.
collator_sort( $coll, $arr );
var_export( $arr );
// Sort array using primary strength.
collator_set_strength( $coll, Collator::PRIMARY );
collator_sort( $coll, $arr );
var_export( $arr );
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
array (
0 => 'ao',
1 => 'Ao',
2 => 'aò',
)
array (
0 => 'aò',
1 => 'Ao',
2 => 'ao',
)
]]>
</screen>
</example>
</para>
</refsect1>
<refsect1 role="seealso">
&reftitle.seealso;
<para>
<simplelist>
<member><link linkend="intl.collator-constants"><classname>Collator</classname> constants</link></member>
<member><function>collator_get_strength</function></member>
</simplelist>
</para>
</refsect1>
</refentry>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"../../../../manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->
|