File: README.md

package info (click to toggle)
node-websocket 1.0.35%2B~cs11.0.30-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,132 kB
  • sloc: cpp: 5,584; javascript: 3,787; ansic: 107; makefile: 15; sh: 7
file content (109 lines) | stat: -rw-r--r-- 3,408 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# is_utf8

Most strings online are in unicode using the UTF-8 encoding. Validating strings
quickly before accepting them is important.

## How to use is_utf8

This is a simple one-source file library to validate UTF-8 strings at high
speeds using SIMD instructions. It works on all platforms (ARM, x64).

Build and link `is_utf8.cpp` with your project. Code usage:

```C++
  #include "is_utf8.h"

  char * mystring = ...
  bool is_it_valid = is_utf8(mystring, thestringlength);
```

It should be able to validate strings using less than 1 cycle per input byte.

## Requirements

- C++11 compatible compiler. We support LLVM clang, GCC, Visual Studio. (Our
  optional benchmark tool requires C++17.)
- For high speed, you should have a recent 64-bit system (e.g., ARM or x64).
- If you rely on CMake, you should use a recent CMake (at least 3.15).
- AVX-512 support require a processor with AVX512-VBMI2 (Ice Lake or better) and
  a recent compiler (GCC 8 or better, Visual Studio 2019 or better, LLVM clang 6
  or better). You need a correspondingly recent assembler such as gas (2.30+) or
  nasm (2.14+): recent compilers usually come with recent assemblers. If you mix
  a recent compiler with an incompatible/old assembler (e.g., when using a
  recent compiler with an old Linux distribution), you may get errors at build
  time because the compiler produces instructions that the assembler does not
  recognize: you should update your assembler to match your compiler (e.g.,
  upgrade binutils to version 2.30 or better under Linux) or use an older
  compiler matching the capabilities of your assembler.

## Build with CMake

```
cmake -B build
cmake --build build
cd build
ctest .
```

Visual Studio users must specify whether they want to build the Release or Debug
version.

To run benchmarks, build and execute the `bench` command.

```
cmake -B build
cmake --build build
./build/benchmarks/bench
```

Instructions are similar for Visual Studio users.

## Real-word usage

This C++ library is part of the JavaScript package
[utf-8-validate](https://github.com/websockets/utf-8-validate). The
utf-8-validate package is routinely downloaded more than
[a million times per week](https://www.npmjs.com/package/utf-8-validate).

If you are using Node JS (19.4.0 or better), you already have access to this
function as
[`buffer.isUtf8(input)`](https://nodejs.org/api/buffer.html#bufferisutf8input).

## Reference

- John Keiser, Daniel Lemire,
  [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090),
  Software: Practice & Experience 51 (5), 2021

## Want more?

If you want a wide range of fast Unicode function for production use, you can
rely on the simdutf library. It is as simple as the following:

```C++
#include "simdutf.cpp"
#include "simdutf.h"

int main(int argc, char *argv[]) {
  const char *source = "1234";
  // 4 == strlen(source)
  bool validutf8 = simdutf::validate_utf8(source, 4);
  if (validutf8) {
    std::cout << "valid UTF-8" << std::endl;
  } else {
    std::cerr << "invalid UTF-8" << std::endl;
    return EXIT_FAILURE;
  }
}
```

See https://github.com/simdutf/

## License

This library is distributed under the terms of any of the following licenses, at
your option:

- Apache License (Version 2.0) [LICENSE-APACHE](LICENSE-APACHE),
- Boost Software License [LICENSE-BOOST](LICENSE-BOOST), or
- MIT License [LICENSE-MIT](LICENSE-MIT).