1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

 Testing requirements after changes:
* Test all functions return either native or bigints. Functions that
return raw MPU::GMP results will return strings, which isn't right.
* Valgrind, coverage
* use: O2 g Wall Wextra Wdeclarationafterstatement fsignedchar
* Test on 32bit Perl, Cygwin, Win32.
* Test on gcc70 (NetBSD), gcc119 (AIX/Power8), gcc22 (MIPS64), gcc115 (aarch)
* prove b I../MathPrimeUtilGMP/blib/lib I../MathPrimeUtilGMP/blib/arch
 For new functions:
XS, .h, .c, PP, PPFE, export, t exports, lib/ntheory.pm, doc, test
 Move .c / .h files into separate directory.
version does it in a painful way. Something simpler to be had?
 finish test suite for bignum. Work on making it faster.
 An assembler version of mulmod for i386.
 More efficient Mertens. The current version has poor growth.
For 32bit, consider doing XS followed by sum for remainder.
 It may be possible to have a more efficient ranged totient. We're using
the sieve up to n/2, which is better than most people seem to use, but I'm
not completely convinced we can't do better. The method at:
http://codegolf.stackexchange.com/a/26747/30069 ends up very similar. For
the monolithic results the main bottleneck seems to be the array return.
 Big features:
 QS factoring
 Figure out a way to make the internal FOR_EACH_PRIME macros use a segmented
sieve.
 Rewrite 23primalityproofs.t for new format (keep some of the old tests?).
 Factoring in PP code is really wasteful  we're calling _isprime7 before
we've done enough trial division, and later we're calling it on known
composites. Note how the XS code splits the factor code into the public
API (small factors, isprime, then call main code) and main code (just the
algorithm). The PP code isn't doing that, which means we're doing lots of
extra primality checks, which aren't cheap in PP.
 Consider using Test::Number::Delta for many tests
 More tweaking of LMO prime count.
 OpenMP. The step 7 inner loop is available.
 Convert to 32bit+GMP to support large inputs, add to MPU::GMP.
 Try __int128.
 Variable sieve size
 look at sieve.c style prime walking
 Fenwick trees for prefix sums
 Iterators speedup:
1) config option for sieved next_prime. Very general, would make
next_prime run fast when called many times sequentially. Nasty
mixing with threads.
2) iterator, PrimeIterator, or PrimeArray in XS using segment sieve.
 Perhaps have main segment know the filled in range. That would allow
a sieved next_prime, and might speed up some counts and the like.
 Benchmark simple SoEs, SoA. Include Sisyphus SoE hidden in Math::GMPz.
 Try using malloc/free for win32 cache memory. #define NO_XSLOCKS
 Investigate optree constant folding in PP compilation for performance.
Use B::Deparse to check.
 Ensure a fast path for Math::GMP from MPU > MPU:GMP > GMP, and back.
 We don't use legendre_phi for other functions any more, but it'd be nice
to speed it up using some ideas from the Ohana 2011 SAGE branch. For example
(10**13,10**5) takes 2.5x longer, albeit with 6x less memory.
 More Pari: parforprime
 znlog:
= GMP BSGS for znlog.
= Clean up znlog (PH, BSGS, Rho).
= Experiment with Wang/Zhang 2012 Rho cycle finding
 consider using Ramanujan Li for PP code.
 xt/paricompare: add chinese, factorial, vecmin, vecmax,
bernfrac, bernreal, LambertW.
 Proth test using LLR. Change mersenne test file to test both.
Note: what does this mean? Both LLR and Proth are in GMP now.
 harmreal and harmfrac for general $k
 For PP, do something like the fibprime examples. At load time, look for
the best library (GMPz, GMP, Pari, BigInt) and set $BICLASS. Then we
should use that class for everything. Go ahead and return that type.
Make a config variable to allow get/set.
 Support FH for print_primes. PerlIO_write is giving me fits.
 Test for print_primes. Not as easy with filenos.
 sum primes better than current method. Especially using less memory.
 divsum and divsummult as block functions.
The latter does sum = vecprod(1 + f(p_i) + f(p_i^2) + ... f(p_i^e) for all p.
 Consider LimLee random prime generation, optionally with proof.
https://pdfs.semanticscholar.org/fd1d/864a95d7231eaf133b00a1757ee5d0bf0e07.pdf
libgcrypt/cipher/primegen.c
 More formal random prime generation for pedantic FIPS etc. users, with
guarantee of specific algorithm.
 surround_primes
 More Montgomery: znlog, catalan
 polymul, polyadd, polydiv, polyneg, polyeval, polyorder, polygcd, polylcm, polyroots, ...
A lot of our ops do these mod n, we could make ..mod versions of each.
 poly_is_reducible
 use wordbased forsieve for nonsegment.
 remove start/end partial word tests from inner loop in forsieve.
 sieve.h and util.h should get along better.
 compare wheel_t with primes separated and possibly cached.
 urandomm with bigints could be faster.
7.7s my $f=factorial(144); urandomm($f) for 1..5e5;
6.2s my $f=factorial(144); urandomm("$f") for 1..5e5;
4.8s my $f="".factorial(144); urandomm($f) for 1..5e5;
5.3s use Math::GMP qw/:constant/; my $f=factorial(144); urandomm($f) for 1..5e5;
1.7s my $f=Math::Prime::Util::GMP::factorial(144); Math::Prime::Util::GMP::urandomm($f) for 1..5e5;
In the first case, we're calling "">bstr>_str once for validation in MPU
and and once for use in MPU::GMP.
The last case is all strings with no read/write bigint objects anywhere.
 Destroy csprng context on thread destruction.
 submit bug report for Perl error in 30b8ab3
 localized a/b in vecreduce, see:
https://metacpan.org/diff/file?target=REHSACK/ListMoreUtilsXS0.428/&source=HERMES%2FListMoreUtilsXS0.427_001#XS.xs
perl #92264 (sort in 5.27.7)
 consider #define PERL_REENTRANT
 add back formultiperm optimization if we can get around lastfor issue.
 make a uint128_t version of montmath. Needs to handle 64bit.
 sieve_range does trial division
 srand with no args should be calling GMP's srand with the selected seed
value. This is all a hacky artifact of having the two codebases.
 Look at using Ramanujan series for PP Li.
 _reftyped as XS call
 update prime count lower/upper from https://arxiv.org/pdf/1703.08032.pdf
 urandomr
 circular primes ... just use repdigits after 1M? https://oeis.org/A068652
 perhaps squarefree flag for factor for early stop. Use in moebius etc.
 make a NVCONST, define log, sqrt, etc. for quadmath vs. long double
 move most of our long double routines to NVCONST (see above).
 Change from Kalai to Bach's algorithm for random factored integers
https://mathspeople.anu.edu.au/~brent/pd/multiplicationHK.pdf
 Adjust crossover in random_factored_integer PP code for Kalai vs. naive
 Pari/GP 2.12 beta has rewritten (much faster) Bernoulli. Check it out.
 semiprime_count PP just walk if small range.
 limit for semiprime and ramanujan prime
 consider adding multifactorial. See MPU::GMP.
 multicall in forpart/forcomp.
 check memory use for nonmulticall. We need enter/leave which were removed.
 consider the various *mod, two arg or 0 for last arg means no mod.
we could use this for e.g. numseqs.pl powerflip to do powmod.
 Add aliquot sum
