File: rsqrt.qbk

package info (click to toggle)
scipy 1.16.0-1exp7
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 234,820 kB
  • sloc: cpp: 503,145; python: 344,611; ansic: 195,638; javascript: 89,566; fortran: 56,210; cs: 3,081; f90: 1,150; sh: 848; makefile: 785; pascal: 284; csh: 135; lisp: 134; xml: 56; perl: 51
file content (68 lines) | stat: -rw-r--r-- 2,860 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
[/
  Copyright Nick Thompson 2020
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]

[section:rsqrt Reciprocal square root]

[h4 Synopsis]

    #include <boost/math/special_functions/rsqrt.hpp>

    namespace boost::math {

    template<class Real>
    Real rsqrt(Real const & x);

    } // namespaces


The function `rsqrt` computes the reciprocal square root 1/[sqrt]/x/.
Those in the game programming community might suspect this is a fast, low precision wrapper around the [@https://www.felixcloutier.com/x86/rsqrtss rsqrtss] instruction.
This is not correct: We /tried/ this instruction, but found no performance benefit to using it.
However, the /trick/ of computing a low precision reciprocal square root and then bootstrapping to higher precision via Newton's method /does/ work, but it only yields a performance benefit for quad and higher precision.
We do of course allow you to use `rsqrt` for `float`, `double`, and `long double`, but be aware there is no performance benefit to doing so.
However, the savings for quad precision and higher are very significant.


The use is

    using boost::multiprecision::float128;
    float128 x = 0.1Q;
    float128 y = boost::math::rsqrt(x);

The reciprocal square root of +\u221E is zero, and the reciprocal square root of a NaN is a NaN.

[$../graphs/rsqrt_quad_0_100.svg]


Performance:

```
Running ./reporting/performance/rsqrt_performance.x
Run on (16 X 4300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 11264 KiB (x1)
Load Average: 0.43, 0.49, 0.46
----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
Rsqrt<float>                                  1.35 ns         1.35 ns    503364351
Rsqrt<double>                                 2.25 ns         2.25 ns    309753242
Rsqrt<long double>                            2.68 ns         2.68 ns    261382652
Rsqrt<float128>                                182 ns          182 ns      3756956
Rsqrt<number<mpfr_float_backend<100>>>         299 ns          299 ns      2494027
Rsqrt<number<mpfr_float_backend<200>>>         412 ns          412 ns      1589284
Rsqrt<number<mpfr_float_backend<300>>>         617 ns          617 ns      1067473
Rsqrt<number<mpfr_float_backend<400>>>         812 ns          812 ns       830564
Rsqrt<number<mpfr_float_backend<1000>>>       3183 ns         3183 ns       221079
Rsqrt<cpp_bin_float_50>                       4321 ns         4321 ns       163243
Rsqrt<cpp_bin_float_100>                      9393 ns         9393 ns        72967
```

[endsect]