1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692

Copyright 19992019 Free Software Foundation, Inc.
Contributed by the AriC and Caramba projects, INRIA.
This file is part of the GNU MPFR Library.
The GNU MPFR Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or (at your
option) any later version.
The GNU MPFR Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public License
along with the GNU MPFR Library; see the file COPYING.LESSER. If not, see
https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
51 Franklin St, Fifth Floor, Boston, MA 021101301, USA.
Table of contents:
1. Documentation
2. Compiler/library detection
3. Changes in existing functions
4. New functions to implement
5. Efficiency
6. Miscellaneous
7. Portability
##############################################################################
1. Documentation
##############################################################################
 add a description of the algorithms used and a proof of correctness
##############################################################################
2. Compiler/library detection
##############################################################################
 if we want to distinguish GMP and MPIR, we can check at configure time
the following symbols which are only defined in MPIR:
#define __MPIR_VERSION 0
#define __MPIR_VERSION_MINOR 9
#define __MPIR_VERSION_PATCHLEVEL 0
There is also a library symbol mpir_version, which should match VERSION, set
by configure, for example 0.9.0.
 update ICC detection.
* Use only __INTEL_COMPILER instead of the obsolete macro __ICC?
* Possibly enable some features with ICC. For instance, removing
"&& !defined(__ICC)" from src/mpfr.h works with ICC 15.0.0 20140723.
##############################################################################
3. Changes in existing functions
##############################################################################
 export mpfr_overflow and mpfr_underflow as public functions
 many functions currently taking into account the precision of the *input*
variable to set the initial working precision (acosh, asinh, cosh, ...).
This is nonsense since the "average" working precision should only depend
on the precision of the *output* variable (and maybe on the *value* of
the input in case of cancellation).
> remove those dependencies from the input precision.
 mpfr_can_round:
change the meaning of the 2nd argument (err). Currently the error is
at most 2^(MPFR_EXP(b)err), i.e. err is the relative shift wrt the
most significant bit of the approximation. I propose that the error
is now at most 2^err ulps of the approximation, i.e.
2^(MPFR_EXP(b)MPFR_PREC(b)+err).
 mpfr_set_q first tries to convert the numerator and the denominator
to mpfr_t. But this conversion may fail even if the correctly rounded
result is representable. New way to implement:
Function q = a/b. nq = PREC(q) na = PREC(a) nb = PREC(b)
If na < nb
a < a*2^(nbna)
n < nanb+ (HIGH(a,nb) >= b)
if (n >= nq)
bb < b*2^(nnq)
a = q*bb+r > q has exactly n bits.
else
aa < a*2^(nqn)
aa = q*b+r > q has exactly n bits.
If RNDN, takes nq+1 bits. (See also the new division function).
 revisit the conversion functions between a MPFR number and a native
floatingpoint value.
* Consequences if some exception is trapped?
* Specify under which conditions (current rounding direction and
precision of the FPU, whether a format has been recognized...),
correct rounding is guaranteed. Fix the code if need be. Do not
forget subnormals.
* Provide mpfr_buildopt_* functions to tell whether the format of a
native type (float / double / long double) has been recognized and
which format it is?
* For functions that return a native floatingpoint value (mpfr_get_flt,
mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), in case of underflow or
overflow, follow the convention used for the functions in <math.h>?
See §7.12.1 "Treatment of error conditions" of ISO C11, which provides
two ways of handling error conditions, depending on math_errhandling:
errno (to be set to ERANGE here) and floatingpoint exceptions.
If floatingpoint exceptions need to be generated, do not use
feraiseexcept(), as this function may require the math library (lm);
use a floatingpoint expression instead, such as DBL_MIN * DBL_MIN
(underflow) or DBL_MAX * DBL_MAX (overflow), which are probably safe
as used in the GNU libc implementation.
* For testing the lack of subnormal support:
see the mfpu GCC option for ARM and
https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level
##############################################################################
4. New functions to implement
##############################################################################
 implement new functions from the C++17 standard:
http://en.cppreference.com/w/cpp/numeric/special_math
assoc_laguerre, assoc_legendre, comp_ellint_1, comp_ellint_2, comp_ellint_3,
cyl_bessel_i, cyl_bessel_j, cyl_bessel_k, cyl_neumann, ellint_1, ellint_2,
ellint_3, hermite, legendre, laguerre, sph_bessel, sph_legendre,
sph_neumann.
Already in mpfr4: beta and riemann_zeta.
See also https://isocpp.org/files/papers/P0226R1.pdf and §29.9.5 in the
C++17 draft:
https://github.com/cplusplus/draft/blob/master/source/numerics.tex
 implement mpfr_get_decimal128 and mpfr_set_decimal128
 implement mpfr_log_ui to compute log(n) for an unsigned long n.
We can write for argument reduction n = 2^k * n/2^k, where
2/3 <= n/2^k < 4/3, i.e., k = floor(log2(3n))1, thus
log(n) = k*log(2) + log(n/2^k), and we can use binary splitting on the
Taylor expansion of log(1+x) to compute log(n/2^k), where at most
p*log(2)/log(3) terms are needed for precision p.
Other idea (from Fredrik Johansson): compute log(m) + log(n/m) where
m=2^a*3^b*5^c*7^d and m is close to n.
 implement mpfr_q_sub, mpfr_z_div, mpfr_q_div?
 implement mpfr_pow_q and variants with two integers (native or mpz)
instead of a rational? See IEEE P1788.
 implement functions for random distributions, see for example
https://sympa.inria.fr/sympa/arc/mpfr/201001/msg00034.html
(suggested by Charles Karney <ckarney@Sarnoff.com>, 18 Jan 2010):
* a Bernoulli distribution with prob p/q (exact)
* a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker
algorithm, but make it exact)
* a uniform distribution in (a,b)
* exponential distribution (mean lambda) (von Neumann's method?)
* normal distribution (mean m, s.d. sigma) (ratio method?)
 wanted for Magma [John Cannon <john@maths.usyd.edu.au>, Tue, 19 Apr 2005]:
HypergeometricU(a,b,s) = 1/gamma(a)*int(exp(su)*u^(a1)*(1+u)^(ba1),
u=0..infinity)
JacobiThetaNullK
PolylogP, PolylogD, PolylogDold: see http://arxiv.org/abs/math.CA/0702243
and the references herein.
JBessel(n, x) = BesselJ(n+1/2, x)
KBessel, KBessel2 [2nd kind]
JacobiTheta
(see http://www.ams.org/journals/mcom/000000000/S002557182017032452/home.html)
LogIntegral
ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun)
DawsonIntegral
GammaD(x) = Gamma(x+1/2)
 new functions of IEEE 7542008, and more generally functions of the
C binding draft TS 186614:
http://www.openstd.org/jtc1/sc22/wg14/www/docs/n1946.pdf
Some propositions about rootn: mpfr_rootn_si, mpfr_rootn_sj, mpfr_rootn_z,
and versions with an unsigned integer: mpfr_rootn_ui (now implemented, as
similar to mpfr_root) and mpfr_rootn_uj.
 functions defined in the LIA2 standard
+ minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq
and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax);
+ rounding_rest, floor_rest, ceiling_rest (5.2.4);
+ remr (5.2.5): x  round(x/y) y;
+ error functions from 5.2.7 (if useful in MPFR);
+ power1pm1 (5.3.6.7): (1 + x)^y  1;
+ logbase (5.3.6.12): \log_x(y);
+ logbase1p1p (5.3.6.13): \log_{1+x}(1+y);
+ rad (5.3.9.1): x  round(x / (2 pi)) 2 pi = remr(x, 2 pi);
+ axis_rad (5.3.9.1) if useful in MPFR;
+ cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u);
+ axis_cycle (5.3.10.1) if useful in MPFR;
+ sinu, cosu, tanu, cotu, secu, cscu, cossinu, arcsinu, arccosu,
arctanu, arccotu, arcsecu, arccscu (5.3.10.{2..14}):
sin(x 2 pi / u), etc.;
[from which sinpi(x) = sin(Pi*x), ... are trivial to implement, with u=2.]
+ arcu (5.3.10.15): arctan2(y,x) u / (2 pi);
+ rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}).
 From GSL, missing special functions (if useful in MPFR):
(cf https://www.gnu.org/software/gsl/manual/gslref.html#SpecialFunctions)
+ The Airy functions Ai(x) and Bi(x) defined by the integral representations:
* Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt
* Bi(x) = (1/\pi) \int_0^\infty (e^((1/3) t^3) + \sin((1/3) t^3 + xt)) dt
* Derivatives of Airy Functions
+ The Bessel functions for n integer and n fractional:
* Regular Modified Cylindrical Bessel Functions I_n
* Irregular Modified Cylindrical Bessel Functions K_n
* Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x,
j_1(x)= (\sin(x)/x\cos(x))/x & j_2(x)= ((3/x^21)\sin(x)3\cos(x)/x)/x
Note: the "spherical" Bessel functions are solutions of
x^2 y'' + 2 x y' + [x^2  n (n+1)] y = 0 and satisfy
j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the
classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99
and mpfr.
Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions
*Irregular Spherical Bessel Functions y_n: y_0(x) = \cos(x)/x,
y_1(x)= (\cos(x)/x+\sin(x))/x &
y_2(x)= (3/x^3+1/x)\cos(x)(3/x^2)\sin(x)
* Regular Modified Spherical Bessel Functions i_n:
i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x)
* Irregular Modified Spherical Bessel Functions:
k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x).
+ Clausen Function:
Cl_2(x) =  \int_0^x dt \log(2 \sin(t/2))
Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm).
+ Dawson Function: \exp(x^2) \int_0^x dt \exp(t^2).
+ Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t  1))
+ Elliptic Integrals:
* Definition of Legendre Forms:
F(\phi,k) = \int_0^\phi dt 1/\sqrt((1  k^2 \sin^2(t)))
E(\phi,k) = \int_0^\phi dt \sqrt((1  k^2 \sin^2(t)))
P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1  k^2 \sin^2(t)))
* Complete Legendre forms are denoted by
K(k) = F(\pi/2, k)
E(k) = E(\pi/2, k)
* Definition of Carlson Forms
RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(1/2) (t+y)^(1)
RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(1/2) (t+y)^(1/2) (t+z)^(3/2)
RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(1/2) (t+y)^(1/2) (t+z)^(1/2)
RJ(x,y,z,p) = 3/2 \int_0^\infty dt
(t+x)^(1/2) (t+y)^(1/2) (t+z)^(1/2) (t+p)^(1)
+ Elliptic Functions (Jacobi)
+ Nrelative exponential:
exprel_N(x) = N!/x^N (\exp(x)  \sum_{k=0}^{N1} x^k/k!)
+ exponential integral:
E_2(x) := \Re \int_1^\infty dt \exp(xt)/t^2.
Ei_3(x) = \int_0^x dt \exp(t^3) for x >= 0.
Ei(x) :=  PV(\int_{x}^\infty dt \exp(t)/t)
+ Hyperbolic/Trigonometric Integrals
Shi(x) = \int_0^x dt \sinh(t)/t
Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]1)/t]
Si(x) = \int_0^x dt \sin(t)/t
Ci(x) = \int_x^\infty dt \cos(t)/t for x > 0
AtanInt(x) = \int_0^x dt \arctan(t)/t
[ \gamma_E is the Euler constant ]
+ FermiDirac Function:
F_j(x) := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(tx) + 1))
+ Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in
algorithms.bib
logarithm of the Pochhammer symbol
+ Gegenbauer Functions
+ Laguerre Functions
+ Eta Function: \eta(s) = (12^{1s}) \zeta(s)
Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{s}.
+ Lambert W Functions, W(x) are defined to be solutions of the equation:
W(x) \exp(W(x)) = x.
This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x))
From Fredrik Johansson:
See https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf, in particular
formulas 5.2 and 5.3 for the error bound: one first computes an
approximation w, and then evaluates the residual w e^w  x. There is an
expression for the error in terms of the residual and the derivative W'(t),
where the derivative can be bounded by piecewise simple functions,
something like min(1, 1/t) when t >= 0.
See https://arxiv.org/abs/1705.03266 for rigorous error bounds.
+ Trigamma Function psi'(x).
and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0.
 functions from ISO/IEC 24747:2009 (Extensions to the C Library,
to Support Mathematical Special Functions).
Standard: http://www.iso.org/iso/catalogue_detail.htm?csnumber=38857
Draft: http://www.openstd.org/jtc1/sc22/wg14/www/docs/n1292.pdf
Rationale: http://www.openstd.org/jtc1/sc22/wg14/www/docs/n1244.pdf
See also: http://www.openstd.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf
(similar, for C++).
Also check whether the functions that are already implemented in MPFR
match this standard.
 from gnumeric (www.gnome.org/projects/gnumeric/doc/functionreference.html):
 incomplete beta function, see message from Martin Maechler
<maechler@stat.math.ethz.ch> on 18 Jan 2016, and Section 6.6 in
Abramowitz & Stegun
 betaln
 degrees
 radians
 sqrtpi
 mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey
and answer from Granlund on mpfr list, May 2007)
 [maybe useful for SAGE] implement companion frac_* functions to the rint_*
functions. For example mpfr_frac_floor(x) = x  floor(x). (The current
mpfr_frac function corresponds to mpfr_rint_trunc.)
 scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/200905/msg00054.html)
 asec, acsc, acot, asech, acsch and acoth (mail from Björn Terelius on mpfr
list, 18 June 2009)
##############################################################################
5. Efficiency
##############################################################################
 Fredrik Johansson reports that mpfr_ai is slow for large arguments: an
asymptotic expansion should be used (once done, remove REDUCE_EMAX from
tests/tai.c and update the description in mpfr.texi).
 for exp(x), Fredrik Johansson reports a 20% speed improvement starting from
4000 bits, and up to a 75% memory improvement in his Arb implementation, by
using recursive instead of iterative binary splitting:
https://github.com/fredrikjohansson/arb/blob/master/elefun/exp_sum_bs_powtab.c
 improve mpfr_grandom using the algorithm in http://arxiv.org/abs/1303.6257
 use the src/x86_64/corei5/mparam.h file once GMP recognizes correctly the
Core i5 processors (note that gcc mtune=native gives __tune_corei7__
and not __tune_corei5__ on those processors)
 implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a
basecase variant
 use mpn_div_q to speed up mpfr_div. However mpn_div_q, which is new in
GMP 5, is not documented in the GMP manual, thus we are not sure it
guarantees to return the same quotient as mpn_tdiv_qr.
Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would
be to first try with mpn_div_q, and if we cannot (easily) compute the
rounding, then use the current code with mpn_divrem.
 improve atanh(x) for small x by using atanh(x) = log1p(2x/(1x)),
and log1p should also be improved for small arguments.
 compute exp by using the series for cosh or sinh, which has half the terms
(see Exercise 4.11 from Modern Computer Arithmetic, version 0.3)
The same method can be used for log, using the series for atanh, i.e.,
atanh(x) = 1/2*log((1+x)/(1x)).
 improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible
idea is to implement a fast algorithm for the argument reconstruction
gamma(x+k): instead of performing k products by x+i, we could precompute
x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products.
One could also use the series for 1/gamma(x), see for example
http://dlmf.nist.gov/5/7/ or formula (36) from
http://mathworld.wolfram.com/GammaFunction.html
 improve the computation of Bernoulli numbers: instead of computing just one
B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a
time, sharing the expensive computation of the 1/p^(2n) series.
 fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for
example on 3Ghz P4 with gmp4.2, x=12.345:
prec=50000 k=2 k=3 k=10 k=100
mpz_root 0.036 0.072 0.476 7.628
mpfr_mpz_root 0.004 0.004 0.036 12.20
See also mail from Carl Witty on mpfr list, 09 Oct 2007.
 for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for
full precision when precision <= MPFR_EXP_THRESHOLD. The reason is
that argument reduction kills sparsity. Maybe avoid argument reduction
for sparse input?
 speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198
 improve mpfr_sin on values like ~pi (do not compute sin from cos, because
of the cancellation). For instance, reduce the input modulo pi/2 in
[pi/4,pi/4], and define auxiliary functions for which the argument is
assumed to be already reduced (so that the sin function can avoid
unnecessary computations by calling the auxiliary cos function instead of
the full cos function). This will require a native code for sin, for
example using the reduction sin(3x)=3sin(x)4sin(x)^3.
See https://sympa.inria.fr/sympa/arc/mpfr/200708/msg00001.html and
the following messages.
 improve generic.c to work for number of terms <> 2^k
 rewrite mpfr_greater_p... as native code.
 mpf_t uses a scheme where the number of limbs actually present can
be less than the selected precision, thereby allowing low precision
values (for instance small integers) to be stored and manipulated in
an mpf_t efficiently.
Perhaps mpfr should get something similar, especially if looking to
replace mpf with mpfr, though it'd be a major change. Alternately
perhaps those mpfr routines like mpfr_mul where optimizations are
possible through stripping low zero bits or limbs could check for
that (this would be less efficient but easier).
 try the idea of the paper "Reduced Cancellation in the Evaluation of Entire
Functions and Applications to the Error Function" by W. Gawronski, J. Mueller
and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to
avoid cancellation in say erfc(x) for x large, they compute the Taylor
expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation),
and then divide by exp(x^2/2) (which is simpler to compute).
 replace the *_THRESHOLD macros by global (TLS) variables that can be
changed at run time (via a function, like other variables)? One benefit
is that users could use a single MPFR binary on several machines (e.g.,
a library provided by binary packages or shared via NFS) with different
thresholds. On the default values, this would be a bit less efficient
than the current code, but this isn't probably noticeable (this should
be tested). Something like:
long *mpfr_tune_get(void) to get the current values (the first value
is the size of the array).
int mpfr_tune_set(long *array) to set the tune values.
int mpfr_tune_run(long level) to find the best values (the support
for this feature is optional, this can also be done with an
external function).
 better distinguish different processors (for example Opteron and Core 2)
and use corresponding default tuning parameters (as in GMP). This could be
done in configure.ac to avoid hacking config.guess, for example define
MPFR_HAVE_CORE2.
Note (VL): the effect on crosscompilation (that can be a processor
with the same architecture, e.g. compilation on a Core 2 for an
Opteron) is not clear. The choice should be consistent with the
build target (e.g. march or mtune value with gcc).
Also choose better default values. For instance, the default value of
MPFR_MUL_THRESHOLD is 40, while the best values that have been found
are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits!
 during the Many Digits competition, we noticed that (our implantation of)
Mulders short product was slower than a full product for large sizes.
This should be precisely analyzed and fixed if needed.
 for various functions, check the timings as a function of the magnitude
of the input (and the input and/or output precisions?), and use better
thresholds for asymptotic expansions.
 improve the special case of mpfr_{add,sub} (x, x, y, ...) when x > y
to do the addition inplace and have a complexity of O(prec(y)) in most
cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit
from this change.
 in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p
is really useful in all cases (see TODO in this file).
 optimize code that uses a test based on the fact that x >> s is
undefined in C for s == width of x but the result is expected to
be 0. ARM and PowerPC could benefit from such an optimization,
but not x86. This needs support from the compiler.
For PowerPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79233
 deal with MPFR_RNDF in mpfr_round_near_x (replaced by MPFR_RNDZ).
 instead of a fixed mparam.h, optionally use function multiversioning
(FMV), currently only available with the GNU C++ front end:
https://gcc.gnu.org/wiki/FunctionMultiVersioning
According to https://lwn.net/Articles/691932/ the dispatch resolution
is now done by the dynamic loader, so that this should be fast enough
(the cost would be the reading of a static variable, initialized at
load time, instead of a constant).
In particular, binary package distributions would benefit from FMV as
only one binary is generated for different processor families.
##############################################################################
6. Miscellaneous
##############################################################################
 [suggested by Tobias Burnus <burnus(at)netb.de> and
Asher Langton <langton(at)gcc.gnu.org>, Wed, 01 Aug 2007]
support quiet and signaling NaNs in mpfr:
* functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p,
mpfr_set_qnan, mpfr_qnan_p
* correctly convert to/from double (if encoding of s/qNaN is fixed in 754R)
Note: Signaling NaNs are not specified by the ISO C standard and may
not be supported by the implementation. GCC needs the fsignalingnans
option (but this does not affect the C library, which may or may not
accept signaling NaNs).
 check again coverage: on 20070727, Patrick Pelissier reports that the
following files are not tested at 100%: add1.c, atan.c, atan2.c,
cache.c, cmp2.c, const_catalan.c, const_euler.c, const_log2.c, cos.c,
gen_inverse.h, div_ui.c, eint.c, exp3.c, exp_2.c, expm1.c, fma.c, fms.c,
lngamma.c, gamma.c, get_d.c, get_f.c, get_ld.c, get_str.c, get_z.c,
inp_str.c, jn.c, jyn_asympt.c, lngamma.c, mpfrgmp.c, mul.c, mul_ui.c,
mulders.c, out_str.c, pow.c, print_raw.c, rint.c, root.c, round_near_x.c,
round_raw_generic.c, set_d.c, set_ld.c, set_q.c, set_uj.c, set_z.c, sin.c,
sin_cos.c, sinh.c, sqr.c, stack_interface.c, sub1.c, sub1sp.c, subnormal.c,
uceil_exp2.c, uceil_log2.c, ui_pow_ui.c, urandomb.c, yn.c, zeta.c, zeta_ui.c.
 check the constants mpfr_set_emin (1638263) and mpfr_set_emax (16383) in
get_ld.c and the other constants, and provide a testcase for large and
small numbers.
 from Kevin Ryde <user42@zip.com.au>:
Also for pi.c, a precalculated compiledin pi to a few thousand
digits would be good value I think. After all, say 10000 bits using
1250 bytes would still be small compared to the code size!
Store pi in round to zero mode (to recover other modes).
 add other prototypes for round to nearestaway (mpfr_round_nearest_away
only deals with the prototypes of say mpfr_sin) or implement it as a native
rounding mode
 add a new roundind mode: round to odd. If the result is not exactly
representable, then round to the odd mantissa. This rounding
has the nice property that for k > 1, if:
y = round(x, p+k, TO_ODD)
z = round(y, p, TO_NEAREST_EVEN), then
z = round(x, p, TO_NEAREST_EVEN)
so it avoids the doublerounding problem.
VL: I prefer the (original?) term "sticky rounding", as used in
J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked
Proof of the Correctness of the Kernel of the AMD5K86 FloatingPoint
Division Algorithm. IEEE Transactions on Computers, 1996.
and
http://www.russinoff.com/libman/text/node26.html
 new rounding mode MPFR_RNDE when the result is known to be exact?
* In normal mode, this would allow MPFR to optimize using
this information.
* In debug mode, MPFR would check that the result is exact
(i.e. that the ternary value is 0).
 new "rounding mode" MPFR_RNDF (faithful rounding)?
This is not really a rounding mode since it is nondeterministic. The
goal is to avoid the Table Maker's Dilemma in internal computations.
The definition of faithful rounding of a real number x is: return either
RNDD(x) or RNDU(x). This means that if x is exactly representable, one
returns x exactly. In MPFR, the ternary value should be unspecified for
efficiency reasons.
Note: One typically implements faithful rounding by computing an
approximation to the result with some adequately chosen error bound,
then by rounding this approximation to nearest.
Concerning the choice of the error bound, if the result x is equal to
1 + t, where t is a very small positive number, then the error bound
needs to be at most ulp(x)/4 + t. Since t can be arbitrarily small,
the error bound needs to be at most ulp(x)/4. And this error bound
is sufficient in all cases. Note that with the even rounding rule or
rounding away from zero, it is not needed to relax the condition when
x is exactly representable.
 add tests of the ternary value for constants
 When doing Extensive Check (enableassert=full), since all the
functions use a similar use of MACROS (ZivLoop, ROUND_P), it should
be possible to do such a scheme:
For the first call to ROUND_P when we can round.
Mark it as such and save the approximated rounding value in
a temporary variable.
Then after, if the mark is set, check if:
 we still can round.
 The rounded value is the same.
It should be a complement to tgeneric tests.
 in div.c, try to find a case for which cy != 0 after the line
cy = mpn_sub_1 (sp + k, sp + k, qsize, cy);
(which should be added to the tests), e.g. by having {vp, k} = 0, or
prove that this cannot happen.
 add a configure test for enablelogging to ignore the option if
it cannot be supported. Modify the "configure help" description
to say "on systems that support it".
 add generic bad cases for functions that don't have an inverse
function that is implemented (use a single Newton iteration).
 add bad cases for the internal error bound (by using a dichotomy
between a bad case for the correct rounding and some input value
with fewer Ziv iterations?).
 add an option to use a 32bit exponent type (int) on LP64 machines,
mainly for developers, in order to be able to test the case where the
extended exponent range is the same as the default exponent range, on
such platforms.
Tests can be done with the expint branch (added on 20101217, and
many tests fail at this time).
 test underflow/overflow detection of various functions (in particular
mpfr_exp) in reduced exponent ranges, including ranges that do not
contain 0.
 add an internal macro that does the equivalent of the following?
MPFR_IS_ZERO(x)  MPFR_GET_EXP(x) <= value
 check whether __gmpfr_emin and __gmpfr_emax could be replaced by
a constant (see README.dev). Also check the use of MPFR_EMIN_MIN
and MPFR_EMAX_MAX.
 add a test checking that no mpfr.h macros depend on mpfrimpl.h
(the current tests cannot check that since mpfrimpl.h is always
included).
 move some macro definitions from acinclude.m4 to the m4 directory
as suggested by the Automake manual? The reason is that the
acinclude.m4 file is big and a bit difficult to read.
 use symbol versioning.
 check whether mpz_t caching (pool) is necessary. Timings with static
with details about the C / C library implementation should be put
somewhere as a comment in the source or in the doc. Using static
is important because otherwise the cache saves the dynamic call to
mpz_init and mpz_clear; so, what we're measuring is not clear.
See thread:
https://gmplib.org/listarchives/gmpdevel/2015September/004147.html
Summary: It will not be integrated in GMP because 1) This yields
problems with threading (in MPFR, we have TLS variables, but this is
not the case of GMP). 2) The gain (if confirmed with static) would
be due to a poor malloc implementation (timings would depend on the
platform). 3) Applications would use more RAM.
Additional notes [VL]: the major differences in the timings given
by Patrick in 201401 under Linux were:
Before:
arccos(x) took 0.054689 ms (32767 eval in 1792 ms)
arctan(x) took 0.042116 ms (32767 eval in 1380 ms)
After:
arccos(x) took 0.043580 ms (32767 eval in 1428 ms)
arctan(x) took 0.035401 ms (32767 eval in 1160 ms)
mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes
from mpfr_atan, which uses mpz a lot. The problem mainly comes from the
reallocations in GMP because mpz_init is used instead of mpz_init2 with
the estimated maximum size. Other places in the code that uses mpz_init
may be concerned.
Issues with mpz_t caching:
* The pool can take much memory, which may no longer be useful.
For instance:
mpfr_init2 (x, 10000000);
mpfr_log_ui (x, 17, MPFR_RNDN);
/* ... */
mpfr_clear (x);
/* followed by code using only small precision */
while contrary to real caches, they contain no data. This is not
valuable memory: freeing/allocating a large block of memory is
much faster than the actual computations, so that mpz_t caching
has no impact on the performance in such cases. A pool with large
blocks also potentially destroys the data locality.
* It assumes that the real GMP functions are __gmpz_init and
__gmpz_clear, which are not part of the official GMP API, thus
is based on GMP internals, which may change in the future or
may be different in forks / compatible libraries / etc. This
can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear
directly, avoiding the #define's.
Questions that need to be answered:
* What about the comparisons with other memory allocators?
* Shouldn't the pool be part of the memory allocator?
For the default memory allocator (malloc): RFE?
If it is decided to keep some form of mpz_t caching, a possible solution
for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both
take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way
similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to
mpz_clear but where the size argument is a hint for the pool; if it is
too large, then the mpz_t should not be pushed back to the pool. The
size argument of mpfr_mpz_init2 could also be a hint to decide which
element to pull from the pool.
 in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722,
that is, with a large error during the computation of the secondary term
(when the TMD occurs).
 add internal or public variants of some basic functions (+, , *) with
mpz_t as the exponent for correctly rounded polynomials (like fmma),
in order to avoid internal overflow and underflow in extreme cases?
Alternatively, for fmma, modify add*.c and sub*.c code to define
mpfr_add_raw, which takes arrays of limbs and their precision, and the
positive exponent delta (as mpfr_uexp_t to be always representable).
The exponent delta should be sufficient for now since it can be bounded
by MPFR_PREC_MAX+1 if need be.
 use the keyword "static" in array indices of parameter declarations with
C99 compilers (6.7.5.3p7) when the pointer is expected not to be null?
For instance, if mpfr.h is changed to have:
__MPFR_DECLSPEC void mpfr_dump (const __mpfr_struct [static 1]);
and one calls
mpfr_dump (NULL);
one gets a warning with Clang. This is just an example; this needs to be
done in a clean way.
See:
http://stackoverflow.com/a/3430353/3782797
https://hamberg.no/erlend/posts/20130218staticarrayindices.html
 change most mpfr_urandomb occurrences to mpfr_urandom in the tests?
(The one done in r10573 allowed us to find a bug even without
assertion checking.)
 tzeta has been much slower since r9848 (which increases the precision
of the input for the low output precisions), at least with the x86
32bit ABI. This seems to come from the fact that the working precision
in the mpfr_zeta implementation depends on the precision of the input.
Once mpfr_zeta has improved, change the last argument of test_generic
in tzeta.c back to 5 (as it was before r10667).
 check the smallprecision tables in the tests?
This may require to export some pointer to the tables, but this could
be done only if some debug macro is defined.
 optionally use malloc() for the caches? See mpfr_mp_memory_cleanup.
Note: This can be implemented by adding a TLS flag saying whether we
are under cache generation or not, and by making the MPFR allocation
functions consider this flag. Moreover, this can only work for mpfr_t
caching (floatingpoint constants), not for mpz_t caching (Bernoulli
constants) because we do not have the control of memory allocation for
mpz_init.
##############################################################################
7. Portability
##############################################################################
 add a web page with results of builds on different architectures
 [Kevin about texp.c long strings]
For strings longer than c99 guarantees, it might be cleaner to
introduce a "tests_strdupcat" or something to concatenate literal
strings into newly allocated memory. I thought I'd done that in a
couple of places already. Arrays of chars are not much fun.
 use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfrgmp.h
 with MinGW, build with D__USE_MINGW_ANSI_STDIO by default?
