1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304
|
# About Nuspell
Nuspell is a fast and safe spelling checker software program. It is designed
for languages with rich morphology and complex word compounding.
Nuspell is written in modern C++ and it supports Hunspell dictionaries.
Main features of Nuspell spelling checker:
- Provides software library and command-line tool.
- Suggests high-quality spelling corrections.
- Backward compatibility with Hunspell dictionary file format.
- Up to 3.5 times faster than Hunspell.
- Full Unicode support backed by ICU.
- Twofold affix stripping (for agglutinative languages, like Azeri,
Basque, Estonian, Finnish, Hungarian, Turkish, etc.).
- Supports complex compounds (for example, Hungarian, German and Dutch).
- Supports advanced features, for example: special casing rules
(Turkish dotted i or German sharp s), conditional affixes, circumfixes,
fogemorphemes, forbidden words, pseudoroots and homonyms.
- Free and open source software. Licensed under GNU LGPL v3 or later.
# Building Nuspell
## Dependencies
Build-only dependencies:
- C++ 17 compiler with support for `std::filesystem`, e.g. GCC >= v9
- CMake >= v3.12
- Catch2 >= v3.1.1 (It is only needed when building the tests. If it is not
available as a system package, then CMake will download it using
`FetchContent`.)
- Getopt (It is needed only on Windows + MSVC and only when the CLI tool or
the tests are built. It is available in vcpkg. Other platforms provide
it out of the box.)
- Pandoc (optional, needed for building the man-page)
Run-time (and build-time) dependencies:
- ICU4C
Recommended tools for developers: qtcreator, ninja, clang-format, gdb,
vim, doxygen.
## Building on GNU/Linux and Unixes
We first need to download the dependencies. Some may already be
preinstalled.
For Ubuntu and Debian:
```bash
sudo apt install g++ cmake libicu-dev catch2 pandoc
```
Then run the following commands inside the Nuspell directory:
```bash
mkdir build
cd build
cmake ..
make
sudo make install
```
<!--sudo ldconfig-->
For faster build process run `make -j`, or use Ninja instead
of Make.
If you are making a Linux distribution package (dep, rpm) you need
some additional configurations on the CMake invocation. For example:
```bash
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr
```
## Building on OSX and macOS
1. Install Apple's Command-line tools.
2. Install Homebrew package manager.
3. Install dependencies with the next commands.
<!-- end list -->
```bash
brew install cmake icu4c catch2 pandoc
export ICU_ROOT=$(brew --prefix icu4c)
```
Then run the standard cmake and make. See above. The ICU\_ROOT variable
is needed because icu4c is keg-only package in Homebrew and CMake can
not find it by default. Alternatively, you can use `-DICU_ROOT=...` on
the cmake command line.
If you want to build with GCC instead of Clang, you need to pull GCC
with Homebrew and rebuild all the dependencies with it. See Homewbrew
manuals.
## Building on Windows
### Compiling with Visual C++
1. Install Visual Studio 2017 or newer. Alternatively, you can use
Visual Studio Build Tools.
2. Install Git for Windows and Cmake.
3. Install Vcpkg in some folder, e.g. in `c:\vcpkg`.
4. Install Pandoc. You can manually install or use `choco install pandoc`.
5. Run the commands bellow. Vcpkg will work in manifest mode and it will
automatically install the dependencies.
<!-- end list -->
```bat
mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=c:\vcpkg\scripts\buildsystems\vcpkg.cmake -A x64
cmake --build .
```
### Compiling with Mingw64 and MSYS2
Download MSYS2, update everything and install the following packages:
```bash
pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-icu \
mingw-w64-x86_64-cmake mingw-w64-x86_64-catch
```
Then from inside the Nuspell folder run:
```bash
mkdir build
cd build
cmake .. -G "Unix Makefiles" -DBUILD_DOCS=OFF
make
make install
```
### Building in Cygwin environment
Download the above mentioned dependencies with Cygwin package manager.
Then compile the same way as on Linux. Cygwin builds depend on
Cygwin1.dll.
## Building on FreeBSD
Install the following required packages
```bash
pkg cmake icu catch2 pandoc
```
Then run the standard cmake and make as on Linux. See above.
# Using the software
## Using the command-line tool
The main executable is located in `src/nuspell`.
After compiling and installing you can run the Nuspell spell checker
with a Nuspell, Hunspell or Myspell dictionary:
nuspell -d en_US text.txt
For more details run see the [man-page](docs/nuspell.1.md).
<!-- old hunspell v1 stuff
The src/tools directory contains ten executables after compiling.
- The main executable:
- nuspell: main program for spell checking and others (see manual)
- Example tools:
- analyze: example of spell checking, stemming and morphological
analysis
- chmorph: example of automatic morphological generation and
conversion
- example: example of spell checking and suggestion
- Tools for dictionary development:
- affixcompress: dictionary generation from large (millions of
words) vocabularies
- makealias: alias compression (Nuspell only, not back compatible
with MySpell)
- wordforms: word generation (Nuspell version of unmunch)
- ~~hunzip: decompressor of hzip format~~ (DEPRECATED)
- ~~hzip: compressor of hzip format~~ (DEPRECATED)
- munch (DEPRECATED, use affixcompress): dictionary generation
from vocabularies (it needs an affix file, too).
- unmunch (DEPRECATED, use wordforms): list all recognized words
of a MySpell dictionary
-->
## Using the Library
Sample program:
```cpp
#include <iostream>
#include <nuspell/dictionary.hxx>
#include <nuspell/finder.hxx>
using namespace std;
int main()
{
auto dirs = vector<filesystem::path>();
nuspell::append_default_dir_paths(dirs);
auto dict_path = nuspell::search_dirs_for_one_dict(dirs, "en_US");
if (empty(dict_path))
return 1; // Return error because we can not find the requested
// dictionary.
auto dict = nuspell::Dictionary();
try {
dict.load_aff_dic(dict_path);
}
catch (const nuspell::Dictionary_Loading_Error& e) {
cerr << e.what() << '\n';
return 1;
}
auto word = string();
auto sugs = vector<string>();
while (cin >> word) {
if (dict.spell(word)) {
cout << "Word \"" << word << "\" is ok.\n";
continue;
}
cout << "Word \"" << word << "\" is incorrect.\n";
dict.suggest(word, sugs);
if (sugs.empty())
continue;
cout << " Suggestions are: ";
for (auto& sug : sugs)
cout << sug << ' ';
cout << '\n';
}
}
```
On the command line you can link like this:
```bash
g++ example.cxx -std=c++17 -lnuspell -licuuc -licudata
# or better, use pkg-config
g++ example.cxx -std=c++17 $(pkg-config --cflags --libs nuspell)
```
Within Cmake you can use `find_package()` to link. For example:
```cmake
find_package(Nuspell)
add_executable(myprogram main.cpp)
target_link_libraries(myprogram Nuspell::nuspell)
```
# Dictionaries
Myspell, Hunspell and Nuspell dictionaries:
<https://github.com/nuspell/nuspell/wiki/Dictionaries-and-Contacts>
# Advanced topics
## Debugging Nuspell
First, always install the debugger:
```bash
sudo apt install gdb
```
For debugging we need to create a debug build and then we need to start
`gdb`.
```bash
mkdir debug
cd debug
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j
gdb src/nuspell/nuspell
```
We recommend debugging to be done
[with an IDE](https://github.com/nuspell/nuspell/wiki/IDE-Setup).
## Testing
To run the tests, run the following command after building:
ctest
# See also
Full documentation in the [wiki](https://github.com/nuspell/nuspell/wiki).
API Documentation for developers can be generated from the source files
by running:
doxygen
The result can be viewed by opening `doxygen/html/index.html` in a web
browser.
|