1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246
|
.TH hunspell 3 "2017-11-20"
.LO 1
.hy 0
.SH NAME
\fBhunspell\fR - spell checking, stemming, morphological generation and analysis
.SH SYNOPSIS
\fB#include <hunspell.hxx> /* or */\fR
.br
\fB#include <hunspell.h>\fR
.br
.sp
.BI "Hunspell(const char *" affpath ", const char *" dpath );
.sp
.BI "Hunspell(const char *" affpath ", const char *" dpath ", const char * " key );
.sp
.BI "~Hunspell(" );
.sp
.BI "int add_dic(const char *" dpath );
.sp
.BI "int add_dic(const char *" dpath ", const char *" key );
.sp
.BI "int spell(const char *" word );
.sp
.BI "int spell(const char *" word ", int *" info ", char **" root );
.sp
.BI "int suggest(char***" slst ", const char *" word);
.sp
.BI "int analyze(char***" slst ", const char *" word);
.sp
.BI "int stem(char***" slst ", const char *" word);
.sp
.BI "int stem(char***" slst ", char **" morph ", int " n);
.sp
.BI "int generate(char***" slst ", const char *" word ", const char *" word2);
.sp
.BI "int generate(char***" slst ", const char *" word ", char **" desc ", int " n);
.sp
.BI "void free_list(char ***" slst ", int " n);
.sp
.BI "int add(const char *" word);
.sp
.BI "int add_with_affix(const char *" word ", const char *" example);
.sp
.BI "int remove(const char *" word);
.sp
.BI "char * get_dic_encoding(" );
.sp
.BI "const char * get_wordchars(" );
.sp
.BI "unsigned short * get_wordchars_utf16(int *" len);
.sp
.BI "struct cs_info * get_csconv(" );
.sp
.BI "const char * get_version(" );
.SH DESCRIPTION
The \fBHunspell\fR library routines give the user word-level
linguistic functions: spell checking and correction, stemming,
morphological generation and analysis in item-and-arrangement style.
.PP
The optional C header contains the C interface of the C++ library with
Hunspell_create and Hunspell_destroy constructor and destructor, and
an extra HunHandle parameter (the allocated object) in the
wrapper functions (see in the C header file \fBhunspell.h\fR).
.PP
The basic spelling functions, \fBspell()\fR and \fBsuggest()\fR can
be used for stemming, morphological generation and analysis by
XML input texts (see XML API).
.
.SS Constructor and destructor
Hunspell's constructor needs paths of the affix and dictionary files.
(In WIN32 environment, use UTF-8 encoded paths started with the long path prefix \\\\?\\ to handle system-independent character encoding and very long path names, too.)
See the \fBhunspell\fR(4) manual page for the dictionary format.
Optional \fBkey\fR parameter is for dictionaries encrypted by
the \fBhzip\fR tool of the Hunspell distribution.
.
.SS Extra dictionaries
The add_dic() function load an extra dictionary file.
The extra dictionaries use the affix file of the allocated Hunspell
object. Maximal number of the extra dictionaries is limited in the source code (20).
.
.SS Spelling and correction
The spell() function returns non-zero, if the input word is recognised
by the spell checker, and a zero value if not. Optional reference
variables return a bit array (info) and the root word of the input word.
Info bits checked with the SPELL_COMPOUND, SPELL_FORBIDDEN or SPELL_WARN
macros sign compound words, explicit forbidden and probably bad words.
From version 1.3, the non-zero return value is 2 for the dictionary
words with the flag "WARN" (probably bad words).
.PP
The suggest() function has two input parameters, a reference variable
of the output suggestion list, and an input word. The function returns
the number of the suggestions. The reference variable
will contain the address of the newly allocated suggestion list or NULL,
if the return value of suggest() is zero. Maximal number of the suggestions
is limited in the source code.
.PP
The spell() and suggest() can recognize XML input, see the XML API section.
.
.SS Morphological functions
The plain stem() and analyze() functions are similar to the suggest(), but
instead of suggestions, return stems and results of the morphological
analysis. The plain generate() waits a second word, too. This extra word
and its affixation will be the model of the morphological generation of
the requested forms of the first word.
.PP
The extended stem() and generate() use the results of a
morphological analysis:
.PP
.RS
.nf
char ** result, result2;
int n1 = analyze(&result, "words");
int n2 = stem(&result2, result, n1);
.fi
.RE
.PP
The morphological annotation of the Hunspell library has fixed
(two letter and a colon) field identifiers, see the
\fBhunspell\fR(4) manual page.
.PP
.RS
.nf
char ** result;
char * affix = "is:plural"; // description depends from dictionaries, too
int n = generate(&result, "word", &affix, 1);
for (int i = 0; i < n; i++) printf("%s\\n", result[i]);
.fi
.RE
.PP
.SS Memory deallocation
The free_list() function frees the memory allocated by suggest(),
analyze, generate and stem() functions.
.SS Other functions
The add(), add_with_affix() and remove() are helper functions of a
personal dictionary implementation to add and remove words from the
base dictionary in run-time. The add_with_affix() uses a second root word
as the model of the enabled affixation and compounding of the new word.
.PP
The get_dic_encoding() function returns "ISO8859-1" or the character
encoding defined in the affix file with the "SET" keyword.
.PP
The get_csconv() function returns the 8-bit character case table of the
encoding of the dictionary.
.PP
The get_wordchars() and get_wordchars_utf16() return the
extra word characters defined in affix file for tokenization by
the "WORDCHARS" keyword.
.PP
The get_version() returns the version string of the library.
.SS XML API
The spell() function returns non-zero for the "<?xml?>" input
indicating the XML API support.
.PP
The suggest() function stems, analyzes and generates the forms of the
input word, if it was added by one of the following "SPELLML" syntaxes:
.PP
.RS
.nf
<?xml?>
<query type="analyze">
<word>dogs</word>
</query>
.fi
.RE
.PP
.PP
.RS
.nf
<?xml?>
<query type="stem">
<word>dogs</word>
</query>
.fi
.RE
.PP
.PP
.RS
.nf
<?xml?>
<query type="generate">
<word>dog</word>
<word>cats</word>
</query>
.fi
.RE
.PP
.PP
.RS
.nf
<?xml?>
<query type="generate">
<word>dog</word>
<code><a>is:pl</a><a>is:poss</a></code>
</query>
.fi
.RE
.PP
.PP
.RS
.nf
<?xml?>
<query type="add">
<word>word</word>
</query>
.fi
.RE
.PP
.PP
.RS
.nf
<?xml?>
<query type="add">
<word>word</word>
<word>model_word_for_affixation_and_compounding</word>
</query>
.fi
.RE
.PP
The outputs of the type="stem" query and the stem() library function
are the same. The output of the type="analyze" query is a string contained
a <code><a>result1</a><a>result2</a>...</code> element. This
element can be used in the second syntax of the type="generate" query.
.SH EXAMPLE
See analyze.cxx in the Hunspell distribution.
.SH AUTHORS
Hunspell based on Ispell's spell checking algorithms and OpenOffice.org's Myspell source code.
.PP
Author of International Ispell is Geoff Kuenning.
.PP
Author of MySpell is Kevin Hendricks.
.PP
Author of Hunspell is László Németh.
.PP
Author of the original C API is Caolan McNamara.
.PP
Author of the Aspell table-driven phonetic transcription algorithm and code is Björn Jacke.
.PP
See also THANKS and Changelog files of Hunspell distribution.
|