File: algorithms-elemfunc.chapt.txt

package info (click to toggle)
yacas 1.3.6-2
  • links: PTS
  • area: main
  • in suites: bullseye, buster, sid, stretch
  • size: 7,176 kB
  • ctags: 3,520
  • sloc: cpp: 13,960; java: 12,602; sh: 11,401; makefile: 552; perl: 517; ansic: 381
file content (1374 lines) | stat: -rw-r--r-- 80,086 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
		Powers

*A powers

There are two computational tasks:
to compute the power $x^n$ where $n$ is an integer (but $x$ may be a real or a complex number),
and compute $x^y$ for arbitrary (real or complex) $x$, $y$.
We assume that $x$, $y$, $n$ are "big" numbers with $P$ significant digits.

We also assume that the power is positive, or else we need to perform an additional division to obtain $x^(-y)=1/x^y$.

If $x!=0$ is known to a relative precision $epsilon$, then $x^y$ has the relative precision $epsilon*y$.
This means a loss of precision if $Abs(y)>1$ and an improvement of precision otherwise.

	    Integer powers

*A powers!by repeated squaring

Integer powers $x^n$ with integer $n$ are computed by a fast algorithm of "repeated squaring".
This algorithm is well known (see, for example,
the famous book, <i>The art of computer programming</i> [Knuth 1973]).

The
algorithm is based on the following trick: if $n$ is even, say $n=2*k$, then
$x^n=(x^k)^2$; and if $n$ is odd, $n=2*k+1$, then $x^n=x*(x^k)^2$. Thus we can
reduce the calculation of $x^n$ to the calculation of $x^k$ with $k<=n/2$,
using at most two long multiplications.
This reduction is one step of the algorithm; at each step $n$ is reduced to at most half.
This algorithm stops when $n$ becomes 1,
which happens after $m$ steps where $m$ is the number of bits in $n$.
So the total number of long multiplications is at most $2*m=(2*Ln(n))/Ln(2)$.
More precisely, it is equal to $m$ plus the number of nonzero bits in the binary representation of $n$.
On the average, we shall have $3/2*Ln(n)/Ln(2)$ long multiplications.
The computational cost of the algorithm is therefore $O(M(P)*Ln(n))$.
This should be compared with e.g. the cost of the best method for $Ln(x)$ which is $O(P*M(P))$.

The outlined procedure is most easily implemented using recursive calls.
The depth of recursion is of order $Ln(n)$ and should be manageable for most real-life applications.
The Yacas code would look like this:
	10# power(_x,1)<--x;
	20# power(_x,n_IsEven)<-- power(x,n>>1)^2;
	30# power(_x,n_IsOdd)<--x*power(x,n>>1)^2;
The function {power(m,n)} calculates the result of $ m^n $ for $ n>0 $, $ m>0 $,
integer $ n $ and integer $ m $.
The bit shifts and the check for an odd number are very fast operations if the internal representation of big numbers uses base $2$.

If we wanted to avoid recursion with its overhead,
we would have to obtain the bits of the number $n$ in reverse order.
This is possible but is somewhat cumbersome unless we store the bits in an array.

*A powers!by repeated squaring!non-recursive

It is easier to implement the non-recursive version of the squaring algorithm in a slightly different form.
Suppose we obtain the bits $b[i]$ of the number $n$ in the usual order, so that $n=b[0]+2*b[1]+...+b[m]*2^m$.
Then we can express the power $x^n$ as
$$ x^n = x^b[0] * (x^2)^b[1] * ... * (x^(2^m))^b[m] $$.
In other words, we evaluate $x^2$, $x^4$, ... by repeated squaring, select those $x^(2^k)$ for which the $k$-th bit $b[k]$ of the number $n$ is nonzero, and multiply all selected powers together.

In the Yacas script form, the algorithm looks like this:

	power(x_IsPositiveInteger,n_IsPositiveInteger)<--
	[
	  Local(result, p);
	  result:=1;
	  p := x;
	  While(n != 0)
	  [ // at step k, p = x^(2^k)
	    if (IsOdd(n))
	      result := result*p;
	    p := p*p;
	    n := n>>1;
	  ];
	  result;
	];

*A powers!modular

The same algorithm can be used to obtain a power of an integer modulo another integer, $Mod(x^n,M)$, if we replace the multiplication {p*p} by a modular multiplication, such as {p:=Mod(p*p,M)}.
Since the remainder modulo $m$ would be computed at each step, the results do not grow beyond $M$.
This allows to efficiently compute even extremely large modular powers of integers.

*A {IntPowerNum}

Matrix multiplication, or, more generally, multiplication in any given ring, can be substituted into the algorithm instead of the normal multiplication.
The function {IntPowerNum} encapsulates the computation of the $n$-th power of an expression using the binary squaring algorithm.


*A powers!by repeated squaring!improvements

The squaring algorithm can be improved a little bit if we are willing to use recursion or to obtain the bits of $n$ in the reverse order.
(This was suggested in the exercise 4.21 in the book
[von zur Gathen <i>et al.</i> 1999].)
Let us represent the power $n$ in base $4$ instead of base $2$.
If $q[k]$ are the digits of $n$ in base $4$, then we can express
$$ x^n = x^q[0] * (x^4)^q[1] * ... * (x^(4^m))^q[m] $$.
We shall compute this expression from right to left: first we compute $x^q[m]$.
This is a small power because $q[m]$ is a digit in base $4$, an integer between $0$ and $3$.
Then we raise it to the $4$th power and multiply by $x^q[m-1]$.
We repeat this process until we reach the $0$th digit of $n$.
At each step we would need to multiply at most three times.
Since each of the $q[k]$ is between $0$ and $3$, we would need to precompute $x^2$ and $x^3$ which requires one extra multiplication ($x^2$ would be computed anyway).
Therefore the total number of long multiplications is in the worst case $3*Ln(n)/Ln(4)+1$.
This is about 25{%} better than the previous worst-case result, $2*Ln(n)/Ln(2)$.
However, the average-case improvement is only about 8{%} because the average number of multiplications in the base-$4$ method is $11/4*Ln(n)/Ln(4)$.

We might then use the base $8$ instead of $4$ and obtain a further small improvement.
(Using bases other than powers of $2$ is less efficient.)
But the small gain in speed probably does not justify the increased complexity of the algorithm.

	    Real powers

*A powers!real numbers

The squaring algorithm can be used to obtain integer powers $x^n$ in any ring---as long as $n$ is an integer, $x$ can be anything from a complex number to a matrix. But for a general real number $n$, there is no such trick and the power $x^n$ has to be computed through the logarithm and the exponential function, $x^n = Exp(n*Ln(x))$.
(This also covers the case when $x$ is negative and the result is a complex number.)

An exceptional case is when $n$ is a rational number with a very small numerator and denominator, for example, $n=2/3$.
In this case it is faster to take the square of the cubic root of $x$.
(See the section on the computation of roots below.)
Then the case of negative $x$ should be handled separately.
This speedup is not implemented in Yacas.

Note that the relative precision changes when taking powers.
If $x$ is known to relative precision $epsilon$, i.e. $x$ represents a real number that could be $x*(1+epsilon)$, then $x^2<=>x*(1+2*epsilon)$ has relative precision $2*epsilon$, while $Sqrt(x)$ has relative precision $epsilon/2$.
So if we square a number $x$, we lose one significant bit of $x$, and when we take a square root of $x$, we gain one significant bit.

		Roots

Computation of roots $r=x^(1/n)$ is efficient when $n$ is a small integer.
The basic approach is to numerically solve the equation $r^n=x$.

Note that the relative precision is improved after taking a root with $n>1$.


	    Method 1: bisection

*A roots!by bisection algorithm

The square root can be computed by using the bisection method, which
works well for integers (if only the integer part of the square root is needed).
The algorithm is described in [Johnson 1987].
The general approach is to scan
each bit of the input number and to see if a certain bit should be set in the resulting integer.
The time is linear in the number of decimals, or logarithmic in
the input number. The method is very similar in approach to the
repeated squaring method described above for raising numbers to a
power.

For integer $ N $, the following steps are performed:

*	0. Find the highest bit set, $ l2 $, in the number $ N $.
*	0. $ 1<<(l2/2) $ is definitely a bit that is set in the result.
Start by setting that bit in the result, $ u = 1<<l2 $.
It is also the highest bit set in the result.
*	0. Now, traverse all the lower bits, one by one. For each lower
bit, starting at $ lnext = l2-1 $, set $ v = 1<<lnext $. Now, 
$ (u+v)^2 = (u^2+2*u*v+v^2) $. If $ (u+v)^2 <= N $, then the bit set
in $ v $ should also be set in the result, $ u $, otherwise that bit should 
be cleared.
*	0. Set $ lnext=lnext-1$, and repeat until all bits are tested, 
and $ lnext = 0 $ and return the result found.

The intermediate results, $ u^2 $, $ v^2 $ and $ 2*u*v $ can be 
maintained easily too, due to the nature of the numbers 
involved ($ v $ having only one bit set, and it being known which bit that
is).

For floating point numbers, first the required number of decimals $p$ after
the decimal point is determined. Then the input number $ N $ is
multiplied by a power of 10 until it has $ 2*p $ decimal. Then the integer square root calculation
is performed, and the resulting number has $ p $ digits of precision.

Below is some {Yacas} script code to perform the calculation for integers.

	//sqrt(1) = 1, sqrt(0) = 0
	10 # BisectSqrt(0) <-- 0;
	10 # BisectSqrt(1) <-- 1;
	
	20 # BisectSqrt(N_IsPositiveInteger) <--
	[
	  Local(l2,u,v,u2,v2,uv2,n);
	
	  // Find highest set bit, l2
	  u  := N;
	  l2 := 0;
	  While (u!=0)
	  [
	    u:=u>>1;
	    l2++;
	  ];
	  l2--;
	
	  // 1<<(l2/2) now would be a good under estimate 
	  // for the square root. 1<<(l2/2) is definitely 
	  // set in the result. Also it is the highest
	  // set bit.
	  l2 := l2>>1;
	
	  // initialize u and u2 (u2==u^2).
	  u  := 1 << l2;
	  u2 := u << l2;
	
	  // Now for each lower bit:
	  While( l2 != 0 )
	  [
		l2--;
	     // Get that bit in v, and v2 == v^2.
	      v  := 1<<l2;
	      v2 := v<<l2;
	
	      // uv2 == 2*u*v, where 2==1<<1, and 
	      // v==1<<l2, thus 2*u*v == 
	      // (1<<1)*u*(1<<l2) == u<<(l2+1)
	      uv2 := u<<(l2 + 1);
	
	      // n = (u+v)^2  = u^2 + 2*u*v + v^2 
	      //   = u2+uv2+v2
	      n := u2 + uv2 + v2;
	
	      // if n (possible new best estimate for 
	      // sqrt(N)^2 is smaller than N, then the 
	      // bit l2 is set in the result, and 
	      // add v to u.
	      if( n <= N )
	      [
	        u  := u+v;  // u <- u+v
	        u2 := n;    // u^2 <- u^2 + 2*u*v + v^2
	      ];
	      l2--;
	    ];
	    u; // return result, accumulated in u.
	];
{BisectSqrt(N)} computes the integer part of $ Sqrt(N) $ for integer $N$.
(If we need to obtain more digits, we should first multiply $N$ by a suitable power of $2$.)
The algorithm works for floats as well as for integers.

The bisection algorithm uses only additions and bit shifting operations.
Suppose the integer $N$ has $P$ decimal digits, then it has $n=P*Ln(10)/Ln(2)$ bits.
For each bit, the number of additions is about $4$.
Since the cost of an addition is linear in the number of bits, the total complexity of the bisection method is roughly $4*n^2=O(P^2)$.

	    Method 2: Newton's iteration

*A roots!by Newton's method

An efficient method for computing the square root is found by using Newton's iteration for the equation $r^2-x=0$.
The initial value of $r$ can be obtained by bit counting and shifting, as in the bisection method.
The iteration formula is
$$ r'=r/2+x/(2*r) $$.
The convergence is quadratic, so we double the number of correct digits at each step.
Therefore, if the initial guess is accurate to one bit, the number of steps $n$ needed to obtain $P$ decimal digits is
$$ n = Ln(P*Ln(10)/Ln(2))/Ln(2) = O(Ln(P))$$.
We need to perform one long division at each step; a long division costs $O(M(P))$.
Therefore the total complexity of this algorithm is $O(M(P)*Ln(P))$.
This is better than the $O(P^2)$ algorithm if the cost of multiplication is below $O(P^2)$.

In most implementations of arbitrary-precision arithmetic, the time to perform a long division is several times that of a long multiplication.
Therefore it makes sense to use a method that avoids divisions.
One variant of Newton's method is to solve the equation
$1/r^2=x$.
The solution of this equation $r=1/Sqrt(x)$ is the limit of the iteration $$r'=r+r*(1-r^2*x)/2$$
that does not require any divisions (but instead requires three multiplications).
The final multiplication $r*x$ completes the calculation of the square root.

As usual with Newton's method, all errors are automatically corrected, so the working precision can be gradually increased until the last iteration.
The full precision of $P$ digits is used only at the last iteration; the last-but-one iteration uses $P/2$ digits and so on.

An optimization trick is to combine the multiplication by $x$ with the last iteration.
Then computations can be organized in a special way to avoid the last full-precision multiplication.
(This is described in [Karp <i>et al.</i> 1997] where the same trick is also applied to Newton's iteration for division.)

The idea is the following: let $r$ be the $P$-digit approximation to $1/Sqrt(x)$ at the beginning of the last iteration.
(In this notation, $2*P$ is the precision of the final result, so $x$ is also known to about $2*P$ digits.)
The unmodified procedure would have run as follows:
$$ r' = r+r*(1-r^2*x)/2 $$,
$$ s = x*r' $$.
Then $s$ would have been the final result, $Sqrt(x)$ to $2*P$ digits.
We would need one multiplication $M(P)$ with $2*P$-digit result to compute $r^2$, then one $M(2*P)$ to compute $r^2*x$ (the product of a $P$-digit $r^2$ and a $2*P$-digit $x$).
Then we subtract this from $1$ and lose $P$ digits since $r$ was already a $P$-digit approximation to $1/Sqrt(x)$.
The value $y:=(1-r^2*x)$ is of order $10^(-P)$ and has $P$ significant digits.
So the third multiplication, $r*y$, is only $M(P)$.
The fourth multiplication $s*x$ is again $M(2*P)$.
The total cost is then $2*M(P)+2*M(2*P)$.

Now consider Newton's iteration for $s<=>Sqrt(x)$,
$$ s' = s+1/s*(1-s^2*x)/2 $$.
The only reason we are trying to avoid it is the division by $s$.
However, after all but the last iterations for $r$ we already have a $P$-digit approximation for $1/s$, which is $r$.
Therefore we can simply define $s=r*x$ and perform the last iteration for $s$, taking $1/s<=>r$.
This is slightly inexact, but the error is higher-order than the precision of the final result, because Newton's method erases any accumulated errors.
So this will give us $2*P$ digits of $s$ without divisions, and lower the total computational cost.

Consider the cost of the last iteration of this combined method.
First, we compute $s=x*r$, but since we only need $P$ correct digits of $s$, we can use only $P$ digits of $x$, so this costs us $M(P)$.
Then we compute $s^2*x$ which, as before, costs $M(P)+M(2*P)$, and then we compute $r*(1-s^2*x)$ which costs only $M(P)$.
The total cost is therefore $3*M(P)+M(2*P)$, so we have traded one multiplication with $2*P$ digits for one multiplication with $P$ digits.
Since the time of the last iteration dominates the total computing time, this is a significant cost savings.
For example, if the multiplication is quadratic, $M(P)=O(P^2)$, then this saves about 30{%} of total execution time; for linear multiplication, the savings is about 16.67{%}.

These optimizations do not change the asymptotic complexity of the method, although they do reduce the constant in front of $O()$.

	    Method 3: argument reduction and interpolation

*A roots!by argument reduction

Before using the bisection or Newton's method, we might apply some argument reduction to speed up the convergence of the iterations and to simplify finding the first approximation.

Suppose we need to find $Sqrt(x)$.
Choose an integer $n$ such that $1/4<x':=4^(-n)*x<=1$.
The value of $n$ is easily found from bit counting: if $b$ is the bit count of $x$, then $$n=Floor((b+1)/2)$$.
We find
$$Sqrt(x)=2^n*Sqrt(x')$$.
The precision of $x'$ is the same as that of $x$ since $2^n$ is an exact number.

To compute $Sqrt(x')$, we use Newton's method with the initial value $x'[0]$ obtained by interpolation of the function $Sqrt(x)$ on the interval [$1/4$,$1$].
A suitable interpolation function might be taken as simply $(2*x+1)/3$ or more precisely
$$Sqrt(x) <=> 1/90*(-28*x^2+95*x+23)$$.
By using a particular interpolation function, we can guarantee a certain number of precise bits at every iteration.

This may save a few iterations, at the small expense of evaluating the interpolation function once at the beginning.
However, in computing with high precision the initial iterations are very fast and this argument reduction does not give a significant speed gain.
But the gain may be important at low precisions, and this technique is sometimes used in microprocessors.

	    Method 4: Halley's iteration

*A {IntNthRoot}
*A roots!by Halley's method

A separate function {IntNthRoot} is provided to compute the integer part of $n^(1/s)$ for integer $n$ and $s$.
For a given $s$, it evaluates the integer part of $n^(1/s)$ using only integer arithmetic with integers of size $n^(1+1/s)$. This can be done by Halley's iteration method, solving the equation $x^s=n$. For this function, the Halley iteration sequence is monotonic.
The initial guess is $x[0]=2^(b(n)/s)$ where $b(n)$ is the number of bits in $n$ obtained by bit counting or using the integer logarithm function. It is clear that the initial guess is accurate to within a factor of 2. Since the relative error is squared at every iteration, we need as many iteration steps as bits in $n^(1/s)$.

Since we only need the integer part of the root, it is enough to use integer division in the Halley iteration. The sequence $x[k]$ will monotonically approximate the number $n^(1/s)$ from below if we start from an initial guess that is less than the exact value. (We start from below so that we have to deal with smaller integers rather than with larger integers.) If $n=p^s$, then after enough iterations the floating-point value of $x[k]$ would be slightly less than $p$; our value is the integer part of $x[k]$. Therefore, at each step we check whether $1+x[k]$ is a solution of $x^s=n$, in which case we are done; and we also check whether $(1+x[k])^s>n$, in which case the integer part of the root is $x[k]$.
To speed up the Halley iteration in the worst case when $s^s>n$, it is combined with bisection. The root bracket interval $x1<x<x2$ is maintained and the next iteration $x[k+1]$ is assigned to the midpoint of the interval if Halley's formula does not give sufficiently rapid convergence. The initial root bracket interval can be taken as $x[0]$, $2*x[0]$.

If $s$ is very large ($s^s>n$), the convergence of both Newton's and Halley's iterations is almost linear until the final few iterations.
Therefore  it is faster to evaluate the floating-point power for large $b$ using the exponential and the logarithm.

	    Method 5: higher-order iterations

*A roots!by higher-order methods

A higher-order generalization of Newton's iteration for inverse square root $1/Sqrt(x)$ is:
$$ r' = r + r/2*(1-r^2*x) + 3*r/8*(1-r^2*x)^2 + ... $$
The more terms of the series we add, the higher is the convergence rate.
This is the Taylor series for $(1-y)^(-1/2)$ where $y:=1-r^2*x$.
If we take the terms up to $y^(n-1)$, the precision at the next iteration will be multiplied by $n$.
The usual second-order iteration (our "method 2") corresponds to $n=2$.

The trick of combining the last iteration with the final multiplication by $x$ can be also used with all higher-order schemes.

Consider the cost of one iteration of $n$-th order.
Let the initial precision of $r$ be $P$; then the final precision is $k*P$ and we use up to $n*P$ digits of $x$.
First we compute $y:=1-r^2*x$ to $P*(n-1)$ digits, this costs $M(P)$ for $r^2$ and then $M(P*n)$ for $r^2*x$.
The value of $y$ is of order $10^(-P)$ and it has $P*(n-1)$ digits, so we only need to use that many digits to multiply it by $r$, and $r*y$ now costs us $M(P*(n-1))$.
To compute $y^k$ (here $2<=k<=n-1$), we need $M(P*(n-k))$ digits of $y$; since we need all consecutive powers of $y$, it is best to compute the powers one after another, lowering the precision on the way.
The cost of computing $r*(y^k)*y$ after having computed $r*y^k$ is therefore $M(P*(n-k-1))$.
The total cost of the iteration comes to
$$ 2*M(P)+M(2*P)+...+M((n-1)*P)+M(n*P) $$.

From the general considerations in the previous chapter (see the section on Newton's method)
it follows that the optimal order is $n=2$ and that higher-order schemes are slower in this case.

	    Which method to use

The bisection method (1) for square roots is probably the fastest for small integers or low-precision floats.
Argument reduction and/or interpolation (3) can be used to simplify the iterative algorithm or to make it more robust.

Newton's method (2) is best for all other cases: large precision and/or roots other than square roots.

		Logarithm

The basic computational task is to obtain the logarithm of a real number.
However, sometimes only the integer part of the logarithm is needed and the logarithm is taken with respect to an integer base.
For example, we might need to evaluate the integer part of $Ln(n)/Ln(2)$ where $n$ is a large integer, to find how many bits are needed to hold $n$.
Computing this "integer logarithm" is a much easier task than computing the logarithm in floating-point.

Logarithms of complex numbers can be reduced to elementary functions of real numbers, for example:
$$ Ln(a+I*b)=1/2*Ln(a^2+b^2)+I*ArcTan(b/a) $$.
For a negative real number $x<0$, we have
$$ Ln(x) = Ln(Abs(x))+I*Pi $$.
This assumes, of course, an appropriate branch cut for the complex logarithm.
A natural choice is to cut along the negative real semiaxis, $Im(z)=0$, $Re(z)<0$.

	    Integer logarithm

*A logarithm!on integers

The "integer logarithm", defined as the integer part of $Ln(x)/Ln(b)$, where $x$ and $b$ are integers, is computed using a special routine {IntLog(x,b)} with purely integer math.
When both arguments are integers and only the integer part of the logarithm is needed, the integer logarithm is much faster than evaluating the full floating-point logarithm and truncating the result.

The basic algorithm consists of (integer-) dividing $x$ by $b$ repeatedly until $x$ becomes $0$ and counting the necessary number of divisions.
If $x$ has $P$ digits and $b$ and $P$ are small numbers, then division is linear in $P$ and the total number of divisions is $O(P)$.
Therefore this algorithm costs $O(P^2)$ operations.

A speed-up for large $x$ is achieved by first comparing $x$ with $b$, then with $b^2$, $b^4$, etc., without performing any divisions.
We perform $n$ such steps until the factor $b^(2^n)$ is larger than $x$.
At this point, $x$ is divided by the previous power of $b$ and the remaining value is iteratively compared with and divided by successively smaller powers of $b$.
The number of squarings needed to compute $b^(2^n)$ is logarithmic in $P$. However, the last few of these multiplications are long multiplications with numbers of length $P/4$, $P/2$, $P$ digits.
These multiplications take the time $O(M(P))$.
Then we need to perform another long division and a series of progressively shorter divisions.
The total cost is still $O(M(P))$.
For large $P$, the cost of multiplication $M(P)$ is less than $O(P^2)$ and therefore this method is preferable. 

There is one special case, the binary (base $2$) logarithm.
Since the internal representation of floating-point numbers is usually in binary, the integer part of the binary logarithm can be usually implemented as a constant-time operation.

	    Real logarithms

There are many methods to compute the logarithm of a real number.
Here we collect these methods and analyze them.

The logarithm satisfies
$Ln(1/x)= -Ln(x)$.
Therefore we need to consider only $x>1$, or alternatively, only $0<x<1$.


*A logarithm!precision

Note that the relative precision for $x$ translates into <i>absolute</i> precision for $Ln(x)$.
This is because $Ln(x*(1+epsilon)) <=> Ln(x)+epsilon$ for small $epsilon$.
Therefore, the relative precision of the result is at best $epsilon/Ln(x)$.
So to obtain $P$ decimal digits of $Ln(x)$, we need to know $P-Ln(Abs(Ln(x)))/Ln(10)$ digits of $x$.
This is better than the relative precision of $x$ if $x>e$ but worse if $x<=>1$.

	    Method 1: Taylor series

*A logarithm!by Taylor series

The logarithm function $Ln(x)$ for general (real or complex) $x$ such that $Abs(x-1)<1$ can be computed using the Taylor series,
$$Ln(1+z) = z - z^2/2 + z^3/3 -...$$
The series converges quite slowly unless $Abs(x)$ is small.
For real $x<1$, the series is monotonic,
$$Ln(1-z) = - z - z^2/2 - z^3/3 -...$$,
and the round-off error is somewhat smaller in that case (but not very much smaller, because the Taylor series method is normally used only for very small $x$).

If $x>1$, then we can compute $-Ln(1/x)$ instead of $Ln(x)$. However, the series converges very slowly if $x$ is close to $0$ or to $2$.

Here is an estimate of the necessary number of terms to achieve a (relative) precision of $P$ decimal digits when computing $Ln(1+x)$ for small real $x$.
Suppose that $x$ is of order $10^(-N)$, where $N>=1$.
The error after keeping $n$ terms is not greater than the first discarded term, $x^(n+1)/(n+1)$.
The magnitude of the sum is approximately $x$, so the relative error is $x^n/(n+1)$ and this should be smaller than $10^(-P)$.
We obtain a sufficient condition $n>P/N$.

All calculations need to be performed with $P$ digits of precision.
The "rectangular" scheme for evaluating $n$ terms of the Taylor series needs about $2*Sqrt(n)$ long multiplications.
Therefore the cost of this calculation is $2*Sqrt(P/N)*M(P)$.

When $P$ is very large (so that a fast multiplication can be used) and $x$ is a small rational number, then the binary splitting technique can be used to compute the Taylor series.
In this case the cost is $O(M(P)*Ln(P))$.

Note that we need to know $P+N$ digits of $1+x$ to be able to extract $P$ digits of $Ln(1+x)$.
The $N$ extra digits will be lost when we subtract $1$ from $1+x$.

	    Method 2: square roots + Taylor series

*A logarithm!by square roots

The method of the Taylor series allows to compute $Ln(x)$ efficiently when $x-1=10^(-N)$ is very close to $1$ (i.e. for large $N$).
For other values of $x$ the series converges very slowly.
We can transform the argument to improve the performance of the Taylor series.

One way is to take several square roots, reducing $x$ to $x^(2^(-k))$ until $x$ becomes close to $1$.
Then we can compute $Ln(x^(2^(-k)))$ using the Taylor series and use the identity $Ln(x) = 2^k*Ln(x^(2^(-k)))$.

The number of times to take the square root can be chosen to minimize the total computational cost.
Each square root operation takes the time equivalent to a fixed number $c$ of long multiplications.
(According to the estimate of [Brent 1975], $c<=>13/2$.)
Suppose $x$ is initially of order $10^L$ where $L>0$.
Then we can take the square root $k[1]$ times and reduce $x$ to about $1.33$.
Here we can take $k[1]<=>Ln(L)/Ln(2)+3$.
After that, we can take the square root $k[2]$ times and reduce $x$ to $1+10^(-N)$ with $N>=1$.
For this we need $k[2]<=>1+N*Ln(10)/Ln(2)$ square roots.
The cost of all square roots is $c*(k[1]+k[2])$ long multiplications.
Now we can use the Taylor series and obtain $Ln(x^(2^(-k[1]-k[2])))$ in $2*Sqrt(P/N)$ multiplications.
We can choose $N$ to minimize the total cost for a given $L$.

*REM FIXME: estimate the round-off error resulting from this and the number of operations.

	    Method 3: inverse exponential

*A logarithm!by inverse exponential

The method is to solve the equation $Exp(x)-a=0$ to find $x=Ln(a)$.
We can use either the quadratically convergent Newton iteration,
$$ x' = x-1+a/Exp(x) $$,
or the cubically convergent Halley iteration,
$$ x' = x - 2*(Exp(x)-a)/(Exp(x)+a) $$.
Each iteration requires one evaluation of $Exp(x)$ and one long division.
Newton's iteration can be rewritten through $Exp(-x)$
but this does not really avoid a long division:
$Exp(-x)$ for positive $x$ is usually computed as $1/Exp(x)$ because other methods are much less efficient.
Therefore the Halley iteration is preferable.

The initial value for $x$ can be found by bit counting on the number $a$.
If $m$ is the "bit count" of $a$, i.e. $m$ is an integer such that $1/2<=a*2^(-m)<1$,
then the first approximation to $Ln(a)$ is $m*Ln(2)$.
(Here we can use a very rough approximation to $Ln(2)$, for example, $2/3$.)

The initial value found in this fashion will be correct to about one bit.
The number of digits triples at each Halley iteration, so the result will have about $3*k$ correct bits after $k$ iterations (this disregards round-off error).
Therefore the required number of iterations for $P$ decimal digits is $1/Ln(3)*Ln(P*Ln(2)/Ln(10))$.

This method is currently faster than other methods (with internal math) and so it is implemented in the routine {Internal'LnNum}.

This method can be generalized to higher orders.
Let $y:=1-a*Exp(-x[0])$, where $x[0]$ is a good approximation to $Ln(a)$ so $y$ is small.
Then
$ Ln(a) = x[0]+Ln(1-y)$ and we can expand in $y$ to obtain
$$ Ln(a) = x[0] - y -y^2/2 - y^3/3 - ... $$
By truncating this sum after $k$-th term we obtain a ($k-1$)-th order method that multiplies the number of correct digits by $k+1$ after each iteration.

The optimal number of terms to take depends on the speed of the implementation of $Exp(x)$.

	    Method 4: AGM

*A logarithm!by AGM sequence

A fast algorithm based on the AGM sequence was given by Salamin (see [Brent 1975]).
The formula is based on an asymptotic relation,
$$Ln(x)=Pi*x*(1+4*x^(-2)*(1-1/Ln(x))+O(x^(-4)))/(2*AGM(x,4))$$.
If $x$ is large enough, the numerator can be replaced by 1.
"Large enough" for a desired precision of $P$ decimal digits means that $4*x^(-2)<10^(-P)$. The AGM algorithm gives $P$ digits only for such large values of $x$, unlike the Taylor series which is only good for $x$ close to 1.

The required number of AGM iterations is approximately $2*Ln(P)/Ln(2)$. For
smaller values of $x$ (but $x>1$),
one can either raise $x$ to a large integer 
power $r$ and then compute $1/r*Ln(x^r)$
(this is quick only if $x$ is itself an integer or a rational),
or multiply $x$ by a large integer power of $2$
and compute $Ln(2^s*x)-s*Ln(2)$ (this is better for floating-point $x$).
Here the required powers are
$$r=Ln(10^P*4)/(2*Ln(x))$$, $$s=P*Ln(10)/(2*Ln(2))+1-Ln(x)/Ln(2)$$. 
The values of these parameters can be found quickly by using the integer logarithm procedure {IntLog},
while constant values such as $Ln(10)/Ln(2)$ can be simply approximated by rational numbers because $r$ and $s$ do not need to be very precise (but they do need to be large enough).
For the second calculation, $Ln(2^s*x)-s*Ln(2)$, we must precompute $Ln(2)$ to the same precision of $P$ digits.
Also, the subtraction of a large number $s*Ln(2)$ leads to a certain loss of precision, namely, about $Ln(s)/Ln(10)$ decimal digits are lost, therefore the operating precision must be increased by this number of digits.
(The quantity $Ln(s)/Ln(10)$ is computed, of course, by the integer logarithm procedure.)

*REM FIXME: the cost estimate of the algorithms with r and s

*A logarithm!choice of AGM vs. Taylor

If $x<1$, then ($-Ln(1/x)$) is computed.

Finally, there is a special case when $x$ is very close to 1, where the Taylor series converges quickly but the AGM algorithm requires to multiply $x$ by a large power of 2 and then subtract two almost equal numbers, leading to a great waste of precision. Suppose $1<x<1+10^(-M)$, where $M$ is large (say of order $P$). The Taylor series for $Ln(1+epsilon)$ needs about $N= -P*Ln(10)/Ln(epsilon)$=$P/M$ terms.
If we evaluate the Taylor series using the rectangular scheme, we need
$2*Sqrt(N)$ multiplications and $Sqrt(N)$ units of storage. On the other
hand, the main slow operation for the AGM sequence is the geometric mean $Sqrt(a*b)$. If $Sqrt(a*b)$ takes an equivalent of $c$ multiplications (Brent's estimate is $c=13/2$ but it may be more in practice), then the AGM sequence requires $2*c*Ln(P)/Ln(2)$ multiplications. Therefore the Taylor series method is more efficient for
$$ M > 1/c^2*P*(Ln(2)/Ln(P))^2 $$.
In this case it requires at most $c*Ln(P)/Ln(2)$ units of storage and $2*c*Ln(P)/Ln(2)$ multiplications.

For larger $x>1+10^(-M)$, the AGM method is more efficient. It is necessary to increase the working precision to $P+M*Ln(2)/Ln(10)$ but this does not decrease the asymptotic speed of the algorithm. To compute $Ln(x)$ with $P$ digits of precision for any $x$, only $O(Ln(P))$ long multiplications are required.

	    Method 5: argument reduction + Taylor series

*A logarithm!by argument reduction

Here is a straightforward method that reduces $Ln(x)$ for large $x>2$ to $Ln(1+delta)$ with a small $delta$;
now the logarithm can be quickly computed using the Taylor series.

The simplest version is this: for integer $m$, we have the identity $Ln(x)=m+Ln(x*e^(-m))$.
Assuming that $e:=Exp(1)$ is precomputed, we can find the smallest integer $m$ for which $x<=e^m$ by computing the integer powers of $e$ and comparing with $x$.
(If $x$ is large, we do not really have to go through all integer $m$: instead we can estimate $m$ by bit counting on $x$ and start from $e^m$.)
Once we found $m$, we can use the Taylor series on $1-delta:=x*e^(-m)$ since we have found the smallest possible $m$, so $0<=delta<1-1/e$.

A refinement of this method requires to
precompute $b=Exp(2^(-k))$ for some fixed integer $k>=1$.
(This can be done efficiently using the squaring trick for the exponentials.)
First we find the smallest power $m$ of $b$ which is above $x$.
To do this, we compute successive powers of $b$ and find the first integer $m$ such that
$ x <= b^m=Exp(m*2^(-k)) $.
When we find such $m$, we define $1-delta := x*b^(-m)$ and then $delta$ will be small, because $0<delta<1-1/b<=>2^(-k)$
(the latter approximation is good if $k$ is large).
We compute $Ln(1-delta)$ using the Taylor series and finally find $Ln(x)=m*2^k+Ln(1-delta)$.

For smaller $delta$, the Taylor series of $Ln(1-delta)$ is more efficient.
Therefore, we have a trade-off between having to perform more multiplications to find $m$, and having a faster convergence of the Taylor series.

*REM FIXME: fill in the details
The number of multiplications needed for the argument reduction is ...

	    Method 6: transformed Taylor series

*A logarithm!by transformed series

We can use an alternative Taylor series for the logarithm that converges for all $x$,
$$ Ln(a+z)= Ln(a)+2*Sum(k,0,Infinity, 1/(2*k+1)*(z/(2*a+z))^(2*k+1)) $$.
This series is obtained from the series for $ArcTanh(x)$ and the identity
$$ 2*ArcTanh(x) = Ln((1+x)/(1-x))$$.

This series converges for all $z$ such that $Re(a+z)>0$ if $a>0$.
The convergence rate is, however, the same as for the original Taylor series.
In other words, it converges slowly unless $z/(2*a+z)$ is small.
The parameter $a$ can be chosen to optimize the convergence; however, $Ln(a)$ should be either precomputed or easily computable for this method to be efficient.

For instance, if $x>1$, we can choose $a=2^k$ for an integer $k>=1$, such that $2^(k-1)<=x<2^k=a$.
(In other words, $k$ is the bit count of $x$.)
In that case, we represent $x=a-z$ and we find that the expansion parameter $z/(2*a-z)<1/3$.
So a certain rate of convergence is guaranteed, and it is enough to take a fixed number of terms, about $P*Ln(10)/Ln(3)$, to obtain $P$ decimal digits of $Ln(x)$ for any $x$.
(We should also precompute $Ln(2)$ for this scheme to work.)

If $0<x<1$, we can compute $-Ln(1/x)$.

This method works robustly but is slower than
the Taylor series with some kind of argument reduction.
With the "rectangular" method of summation, the total cost is $O(Sqrt(P)*M(P))$.

	    Method 7: binary reduction

*A logarithm!by binary reduction

This method is based on the binary splitting technique and is described in [Haible <i>et al.</i> 1998] with a reference to [Brent 1976].

The method shall compute $Ln(1+x)$ for real $x$ such that $Abs(x)<1/2$.
For other $x$, some sort of argument reduction needs to be applied.
(So this method is a replacement for the Taylor series that is asymptotically faster at very high precision.)

The main idea is to use the property
$$ Ln(1+z*2^(-k)) = z*2^(-k) + O(2^(-2*k)) $$
for integer $k>=1$ and real $z$ such that $Abs(z)<=1$.
This property allows to find the first $2*k$ binary digits of $Ln(1+z*2^(-k))$ by inspection:
these digits are the first $k$ nonzero digits of $z$.
Then we can perform a very quick computation of $Exp(-m*2^(-k))$ for integer $k$, $m$ (evaluated using the binary splitting of the Taylor series)
and reduce $z$ by at least the factor $2^k$.

More formally, we can write the method as a loop over $k$, starting with $k=1$ and stopping when $2^(-k)<10^(-P)$ is below the required precision.
At the beginning of the loop we have $y=0$, $z=x$, $k=1$ and $Abs(z)<1/2$.
The loop invariants are $(1+z)*Exp(y)$ which is always equal to the original number $1+x$, and the condition $Abs(z)<2^(-k)$.
If we construct this loop, then it is clear that at the end of the loop $1+z$ will become $1$ to required precision and therefore $y$ will be equal to $Ln(1+x)$.

The body of the loop consists of the following steps:
*	0. Separate the first $k$ significant digits of $z$:
$$ f=2^(-2*k)*Floor(2^(2*k)*z) $$.
Now $f$ is a good approximation for $Ln(1+z)$.
*	0. Compute $Exp(-f)$ using the binary splitting technique ($f$ is a rational number with the denominator $2^(2*k)$ and numerator at most $2^k$).
It is in fact sufficient to compute $1-Exp(-f)$ which does not need all digits.
*	0. Set $y=y+f$ and $z=(1+z)*Exp(-f)-1$.

The total number of steps in the loop is at most $Ln(P*Ln(10)/Ln(2))/Ln(2)$.
Each step requires $O(M(P)*Ln(P))$ operations because the exponential $Exp(-f)$ is taken at a rational arguments $f$ and can be computed using the binary splitting technique.
(Toward the end of the loop, the number of significant digits of $f$ grows, but the number of digits we need to obtain is decreased.
At the last iteration, $f$ contains about half of the digits of $x$ but computing $Exp(-f)$ requires only one term of the Taylor series.)
Therefore the total cost is $O(M(P)*Ln(P)^2)$.

*REM FIXME: fill in the details

Essentially the same method can be used to evaluate a complex logarithm, $Ln(a+I*b)$.
It is slower but the asymptotic cost is the same.

	    Method 8: continued fraction

*A logarithm!by continued fraction
*A continued fraction approximation!of $Ln(x)$

There is a continued fraction representation of the logarithm:
$$ Ln(1+x) = x/(1+x/(2+x/(3+(4*x)/(4+(4*x)/(5+(9*x)/(6+ ...)))))) $$.
This fraction converges for all $x$, although the speed of convergence varies with the magnitude of $x$.

This method does not seem to provide a computational advantage compared with the other methods.

	    Method 9: bisection

A simple bisection algorithm for $Ln(x)/Ln(2)$ (the base 2 logarithm) with real $x$ is described in [Johnson 1987].

First, we need to divide $x$ by a certain power of $2$ to reduce $x$ to  $y$ in the interval $1<=y<2$.
We can use the bit count $m=BitCount(x)$ to find an integer $m$ such that $1/2<=x*2^(-m)<1$ and take $y=x*2^(1-m)$.
Then $Ln(x)/Ln(2) = Ln(y)/Ln(2)+m-1$.

Now we shall find the bits in the binary representation of $Ln(y)/Ln(2)$, one by one.
Given a real $y$ such that $1<=y<2$, the value $Ln(y)/Ln(2)$ is between 0 and 1.
Now,
$$ Ln(y)/Ln(2) = 2^(-1)*Ln(y^2)/Ln(2) $$.
The leading bit of this value is $1$ if $y^2>=2$ and 0 otherwise.
Therefore we need to compute $y' = y^2$ using a long $P$-digit multiplication and compare it with $2$.
If $y' >= 2$ we set $y=y'/2$, otherwise we set $y=y'$; then we obtain $1<=y<2$ again and repeat the process to extract the next bit of $Ln(y)/Ln(2)$.

The process is finished either when the required number of bits of $Ln(y)/Ln(2)$ is found, or when the precision of the argument is exhausted, whichever occurs first.
Note that each iteration requires a long multiplication (squaring) of a number, and each squaring loses 1 bit of relative precision, so after $k$ iterations the number of precise bits of $y$ would be $P-k$.
Therefore we cannot have more iterations than $P$ (the number of precise bits in the original value of $x$).
The total cost is $O(P*M(P))$.

The squaring at each iteration needs to be performed not with all digits, but with the number of precise digits left in the current value of $y$.
This does not reduce the asymptotic complexity; it remains $O(P*M(P))$.

Comparing this method with the Taylor series, we find that the only advantage of this method is simplicity.
The Taylor series requires about $P$ terms, with one long multiplication and one short division per term, while the bisection method does not need any short divisions.
However, the rectangular method of Taylor summation cuts the time down to $O(Sqrt(P))$ long multiplications, at a cost of some storage and bookkeeping overhead.
Therefore, the bisection method may give an advantage only at very low precisions.
(This is why it is sometimes used in microprocessors.)
The similar method for the exponential function requires a square root at every iteration and is never competitive with the Taylor series.

	    Which method to use

This remains to be seen.

		Exponential

There are many methods to compute the exponential of a real number.
Here we collect these methods and analyze them.

The exponential function satisfies
$Exp(-x)=1/Exp(x)$.
Therefore we need to consider only $x>0$.

*A exponential function $Exp(x)$!precision

Note that the <i>absolute</i> precision for $x$ translates into <i>relative</i> precision for $Exp(x)$.
This is because $Exp(x+epsilon) <=> Exp(x)*(1+epsilon)$ for small $epsilon$.
Therefore, to obtain $P$ decimal digits of $Exp(x)$ we need to know $x$ with absolute precision of at least $10^(-P)$,
that is, we need to know $P+Ln(Abs(x))/Ln(10)$ digits of $x$.
Thus, the relative precision becomes worse after taking the exponential if $x>1$ but improves if $x$ is very small.

	    Method 1: Taylor series

*A exponential function $Exp(x)$!by Taylor series

The exponential function is computed using its Taylor series,
$$ Exp(x) = 1 + x + x^2/2! + ...$$
This series converges for all (complex) $x$, but if $Abs(x)$ is large, or if $x$ is negative, then the series converges slowly and/or gives a large round-off error.
So one should use this Taylor series only when $x$ is small.

If $x$ is sufficiently small, e.g. $Abs(x)<10^(-M)$ and $M>Ln(P)/Ln(10)$, then it is enough to take about $P/M$ terms in the Taylor series. If $x$ is of order 1, one needs about $P*Ln(10)/Ln(P)$ terms.

*REM FIXME: fill in the details

If $x=p/q$ is a small rational number, and if a fast multiplication is available, then the binary splitting technique should be used to evaluate the Taylor series.
The computational cost of that is $O(M(P*Ln(P))*Ln(P))$.

	    Method 2: squaring + Taylor series

*A exponential function $Exp(x)$!squaring speed-up

A speed-up trick used for large $x$ is to divide the argument by some power of 2 and then square the result several times, i.e.
$$Exp(x) = (Exp(2^(-k)*x))^(2^k)$$,
where $k$ is chosen sufficiently large so that the Taylor series converges quickly at $2^(-k)*x$ [Smith 1985]. The threshold value for $x$ can be obtained through {MathExpThreshold()}, and set through {SetMathExpThreshold(threshold)} in {stdfuncs}.

*REM FIXME: round-off error for squaring. How many times to square. How many terms to take.

A modification of the squaring reduction allows to significantly reduce the round-off error [Brent 1978].
Instead of $Exp(x)=Exp(x/2)^2$, we use the identity
$$ Exp(x)-1 = (Exp(x/2)-1)*(Exp(x/2)+1) $$
and reduce $Exp(x)-1$ directly to $Exp(x/2)-1$.
If $y=Exp(x/2)-1$, then $Exp(x)-1=2*y+y^2$.


*REM FIXME: analyze the round-off error and show why this is better

	    Method 3: inverse logarithm

*A exponential function $Exp(x)$!by inverse logarithm

An alternative way to compute $x=Exp(a)$ if a fast logarithm routine is available would be to solve the equation $Ln(x)=a$.
(This might be better at very large precision where the AGM method for the logarithm is asymptotically the fastest.)

Newton's method gives the iteration
$$ x'=x*(a+1-Ln(x)) $$.
The iteration converges quadratically to $Exp(a)$ if the initial value of $x$ is $0<x<Exp(a+1)$.

A cubically convergent formula is obtained if we replace $Ln(x)=a$ by an equivalent equation
$$ (Ln(x)-a)/(Ln(x)-a-2) = 0$$.
For this equation, Newton's method gives the iteration
$$ x' = x*(1+(a+1-Ln(x))^2)/2 $$.
This iteration converges for initial values $0<x<Exp(a+2)$ with a cubic convergence rate and requires only one more multiplication, compared with Newton's method for $Ln(x)=a$. A good initial guess can be found by raising 2 to the integer part of $a/Ln(2)$ (the value $Ln(2)$ can be approximated from above by a suitable rational number, e.g. $7050/10171$).

This cubically convergent iteration seems to follow from a good equivalent equation that we guessed.
But it turns out that it can be generalized to higher orders.
Let $y:=a-Ln(x[0])$ where $x[0]$ is an approximation to $Exp(a)$; if it is a good approximation, then $y$ is small.
Then $Exp(a)=x[0]*Exp(y)$.
Expanding in $y$, we obtain
$$ Exp(a) = x[0]*(1+y+y^2/2! +y^3/3! + ...) $$,
and if we truncate the series after $k$-th term, the new approximation will have $k$ times the number of correct digits in $x[0]$.
It is easy to see that the above cubic iteration is a particular case with $k=3$.


The optimal number of terms to take depends on the speed of the implementation of $Ln(x)$.


	    Method 4: linear reduction + Taylor series

*A exponential function $Exp(x)$!by linear reduction

In this method we reduce the argument $x$ by subtracting an integer.
Suppose $x>1$, then take $n=Floor(x)$ where $n$ is an integer, so that $0<=x-n<1$.
Then we can compute $Exp(x)=Exp(n)*Exp(x-n)$ by using the Taylor series on the small number $x-n$.
The integer power $e^n$ is found from a precomputed value of $e$.

A refinement of this method is to subtract not only the integer part of $x$, but also the first few binary digits.
We fix an integer $k>=1$ and precompute $b:=Exp(2^(-k))$.
Then we find the integer $m$ such that $0<=x-m*2^(-k)<2^(-k)$.
(The rational number $m*2^(-k)$ contains the integer part of $x$ and the first $k$ bits of $x$ after the binary point.)
Then we compute $Exp(x-m*2^(-k))$ using the Taylor series and $Exp(m*2^(-k))=b^m$ by the integer powering algorithm from the precomputed value of $b$.

The parameter $k$ should be chosen to minimize the computational effort.

*REM FIXME: fill in the details 

	    Method 5: binary rediction

*A exponential function $Exp(x)$!by binary reduction

This method is based on the binary splitting technique and is described in [Haible <i>et al.</i> 1998] with a reference to [Brent 1976].
The idea is to reduce the computation of $Exp(x)$ to the computation of $Exp(r[k])$ for some rational numbers $r[0]$, $r[1]$, $r[2]$, ...

Take the binary decomposition of $x$ of the following form,
$$ x = x[0] +Sum(k,0,N,u[k]*2^(-2^k)) $$,
where $x[0]$ and $u[k]$ are integers such that $Abs(u[k])<2^(2^(k-1))$.
Then define $r[k]=u[k]*2^(-2^k)$.
Note that all $r[k]$ are rational numbers such that $Abs(r[k])<2^(-2^(k-1))$.
The exponentials $Exp(r[k])$ are computed using the binary splitting on the Taylor series.
Finally,
$$Exp(x) = Exp(x[0])*Exp(r[0])*Exp(r[1])*...$$.

The cost of this method is $O(M(P*Ln(P))*Ln(P))$ operations.

Essentially the same method can be used to compute the complex exponential, $Exp(a+I*b)$.
This is slower but the asymptotic cost is the same.

	    Method 6: continued fraction

*A exponential function $Exp(x)$!by continued fraction
*A continued fraction approximation!of $Exp(x)$

There is a continued fraction representation of the exponential function:
$$ Exp(-x) = 1-x/(1+x/(2-x/(3+x/(2-x/(5+x/(2- ...)))))) $$.
This fraction converges for all $x$, although the speed of convergence varies with the magnitude of $x$.

This method does not seem to provide a computational advantage compared with the other methods.

	    Which method to use

This remains to be seen.


		Calculation of $Pi$

*A computation of $pi$

In Yacas, the constant $pi$ is computed by the library routine {Internal'Pi()} which
uses the internal routine {MathPi} to compute the value to current
precision {Builtin'Precision'Set()}. The result is stored in a global variable as a
list of the form {{precision, value}} where {precision} is the number of
digits of $pi$ that have already been found and {value} is the
multiple-precision value. This is done to avoid recalculating $pi$ if a
precise enough value for it has already been found.

Efficient iterative algorithms for computing $pi$ with arbitrary precision have been recently developed by Brent, Salamin, Borwein and others. However, limitations of the current multiple-precision implementation in Yacas (compiled with the "internal" math option) make these advanced algorithms run slower because they require many more arbitrary-precision multiplications at each iteration.

*A computation of $pi$!by Newton's method

The file {examples/pi.ys} implements several different algorithms that
duplicate the functionality of {Internal'Pi()}. See
[Gourdon <i>et al.</i> 2001] for more details of computations of $pi$ and
generalizations of Newton-Raphson iteration.

	    Method 1: solve $Sin(x)=0$

{PiMethod0()}, {PiMethod1()}, {PiMethod2()} are all based on a generalized Newton-Raphson method of solving equations.

Since $pi$ is a solution of $Sin(x)=0$, one may start sufficiently close, e.g. at $x0 = 3.14159265$ and iterate $x'=x-Tan(x)$. In fact it is faster to iterate
$x'=x+Sin(x)$ which solves a different equation for $pi$. {PiMethod0()} is the straightforward implementation of the latter iteration. A significant speed improvement is achieved by doing calculations at each iteration only with the precision of the root that we expect to get from that iteration. Any imprecision introduced by round-off will be automatically corrected at the next iteration.

If at some iteration $x=pi+epsilon$ for small $epsilon$, then from the Taylor expansion of $Sin(x)$ it follows that the value $x'$ at the next iteration will differ from $pi$ by $O(epsilon^3)$. Therefore, the number of correct digits triples at each iteration. If we know the number of correct digits of $pi$ in the initial approximation, we can decide in advance how many iterations to compute and what precision to use at each iteration.

The final speed-up in {PiMethod0()} is to avoid computing at unnecessarily high precision. This may happen if, for example, we need to evaluate 200 digits of $pi$ starting with 20 correct digits. After 2 iterations we would be calculating with 180 digits; the next iteration would have given us 540 digits but we only need 200, so the third iteration would be wasteful. This can be avoided by first computing $pi$ to just over 1/3 of the required precision, i.e. to 67 digits, and then executing the last iteration at full 200 digits. There is still a wasteful step when we would go from 60 digits to 67, but much less time would be wasted than in the calculation with 200 digits of precision.

Newton's method is based on approximating the function $f(x)$ by a straight line. One can achieve better approximation and therefore faster convergence to the root if one approximates the function with a polynomial curve of higher order. The routine {PiMethod1()} uses the iteration 
$$ x'=x+Sin(x)+1/6*Sin(x)^3 + 3/40*Sin(x)^5 + 5/112*Sin(x)^7$$
which has a faster convergence, giving 9 times as many digits at every iteration. (The series is the Taylor series for $ArcSin(y)$ cut at $O(y^9)$.) The same speed-up tricks are used as in {PiMethod0()}. In addition, the last iteration, which must be done at full precision, is performed with the simpler iteration $x'=x+Sin(x)$ to reduce the number of high-precision multiplications.

Both {PiMethod0()} and {PiMethod1()} require a computation of $Sin(x)$ at every iteration. An industrial-strength arbitrary precision library such as {gmp} can multiply numbers much faster than it can evaluate a trigonometric function. Therefore, it would be good to have a method which does not require trigonometrics. {PiMethod2()} is a simple attempt to remedy the problem. It computes the Taylor series for $ArcTan(x)$,
$$ ArcTan(x) = x - x^3/3 + x^5/5 - x^7/7 + ... $$,
for the value of $x$ obtained as the tangent of the initial guess for $pi$; in other words, if $x=pi+epsilon$ where $epsilon$ is small, then $Tan(x)=Tan(epsilon)$, therefore $epsilon = ArcTan(Tan(x))$ and $pi$ is found as $pi = x - epsilon$. If the initial guess is good (i.e. $epsilon$ is very small), then the Taylor series for $ArcTan(x)$ converges very quickly (although linearly, i.e. it gives a fixed number of digits of $pi$ per term). Only a single full-precision evaluation of $Tan(x)$ is necessary at the beginning of the algorithm.
The complexity of this algorithm is proportional to the number of digits and to the time of a long multiplication.

	    Method 2: Borwein's iteration

The routines {PiBrentSalamin()} and {PiBorwein()} are based on much more advanced mathematics.
(See papers by P. Borwein for review and explanations of the methods.)
These methods do not require evaluations of trigonometric functions, but they do require taking a few square roots at each iteration, and all calculations must be done using full precision. Using modern algorithms, one can compute a square root roughly in the same time as a division; but Yacas's internal math is not yet up to it. Therefore, these two routines perform poorly compared to the more simple-minded {PiMethod0()}.

	    Method 3: AGM sequence (Brent-Salamin)

*A computation of $pi$!by Brent-Salamin method

The algorithm of Brent and Salamin uses the AGM sequence.
The calculation can be summarized as follows:

$$ [a=1; b=1/Sqrt(2); c=1/2; k=1;] $$
	While(Not enough precision) [
$$ List(a, b) = List((a+b)/2, Sqrt(a*b)) $$
$$ c = c - 2^k*(a^2-b^2) $$
$$ pi = 2*a^2/c $$
$$ k=k+1 $$
	];

At each iteration, the variable $pi$ will have twice as many correct digits as it had at the previous iteration.

	    Method 4: Ramanujan's series

*A computation of $pi$!by Ramanujan's series

Another method for fast computation of $Pi$ is based on the following mysterious series,
$$ 1/Pi = 12/(C*Sqrt(C))*Sum(n,0,Infinity, ((-1)^n)*(6*n)! *(A+n*B)/((3*n)! * (n!)^3 * C^(3*n))) $$,
where $A=13591409$, $B=545140134$, and $C=640320$.
This formula was found by the Chudnovsky brothers, but it traces back to Ramanujan's notebooks.

To obtain the value of $Pi$ with $P$ decimal digits, one needs to take 
$$ n <=> P*Ln(10)/(3*Ln(C/12))<479/6793*P $$
terms of the series.

If this series is evaluated using Horner's scheme (the routine {PiChudnovsky}), then about $Ln(n)/Ln(10)$ extra digits are needed to compensate for round-off error while adding $n$ terms.
This method does not require any long multiplications and costs $O(P^2)$ operations.

A potentially much faster way to evaluate this series at high precision is by using the binary splitting technique.
This would give the asymptotic cost $O(M(P*Ln(P))*Ln(P))$.

	    Which method to use

This remains to be seen.

		Trigonometric functions

Trigonometric functions $Sin(x)$, $Cos(x)$ are computed by subtracting $2*Pi$ from $x$ until it is in the range $0<x<2*Pi$ and then using the Taylor series.
(The value of $Pi$ is precomputed.)

Tangent is computed by dividing $Sin(x)/Cos(x)$ or from $Sin(x)$ using the identity
$$ Tan(x) = Sin(x)/Sqrt(1-Sin(x)^2) $$.

	    Method 1: Taylor series

The Taylor series for the basic trigonometric functions are
$$ Sin(x) = x - x^3/3! + x^5/5! - x^7/7! +...$$,
$$ Cos(x) = 1 - x^2/2! + x^4/4! - x^6/6! +...$$.
These series converge for all $x$ but are optimal for multiple-precision calculations only for small $x$.
The convergence rate and possible optimizations are the same as those of the Taylor series for $Exp(x)$.

	    Method 2: argument reduction

Basic argument reduction requires a precomputed value for $Pi/2$.
The identities $Sin(x+Pi/2)= Cos(x)$, $Cos(x+Pi/2)= -Sin(x)$ can be used to reduce the argument to the range between $0$ and $Pi/2$.
Then the bisection for $Cos(x)$ and the trisection for $Sin(x)$ are used.

For $Cos(x)$, the bisection identity can be used more efficiently if it is written as
$$ 1-Cos(2*x) = 4*(1-Cos(x))-2*(1-Cos(x))^2 $$.
If $1-Cos(x)$ is very small, then this decomposition allows to use a shorter multiplication and reduces round-off error.

For $Sin(x)$, the trisection identity is
$$ Sin(3*x) = 3*Sin(x) - 4*Sin(x)^3 $$.

The optimal number of bisections or trisections should be estimated to reduce the total computational cost.
The resulting number will depend on the magnitude of the argument $x$, on the required precision $P$, and on the speed of the available multiplication $M(P)$.

	    Method 3: inverse $ArcTan(x)$

The function $ArcTan(x)$ can be found from its Taylor series, or from the complex AGM method, or by another method.
Then the function can be inverted by Newton's iteration to obtain $Tan(x)$ and from it also $Sin(x)$, $Cos(x)$ using the trigonometric identities.

Alternatively, $ArcSin(x)$ may be found from the Taylor series and inverted to obtain $Sin(x)$.

This method seems to be of marginal value since efficient direct methods for $Cos(x)$, $Sin(x)$ are available.


	    Which method to use

This remains to be seen.

		Inverse trigonometric functions

*A computation of $ArcTan(x)$
*A continued fraction approximation!of $ArcTan(x)$

Inverse trigonometric functions are computed by various methods.
To compute $y=ArcSin(x)$, Newton's method is used for to invert $x=Sin(y)$.
The inverse tangent $ArcTan(x)$ can be computed by its Taylor series,
$$ ArcTan(x) = x - x^3/3 + x^5/5 - ...$$,
or by the continued fraction expansion,
$$ArcTan(x) = x/(1+x^2/(3+(2*x)^2/(5+(3*x)^2/(7+...))))$$.
The convergence of this expansion for large $Abs(x)$ is improved by using the identities
$$ ArcTan(x) = Pi/2*Sign(x) - ArcTan(1/x)$$,
$$ ArcTan(x) = 2*ArcTan(x/(1+Sqrt(1+x^2))) $$.
Thus, any value of $x$ is reduced to $Abs(x)<0.42$. This is implemented in the standard library scripts.

By the identity $ArcCos(x) := Pi/2 - ArcSin(x)$, the inverse cosine is reduced to the inverse sine. Newton's method for $ArcSin(x)$ consists of solving the equation $Sin(y)=x$ for $y$. Implementation is similar to the calculation of $pi$ in {PiMethod0()}. 

For $x$ close to 1, Newton's method for $ArcSin(x)$ converges very slowly. An identity $$ArcSin(x)=Sign(x)*(Pi/2-ArcSin(Sqrt(1-x^2)))$$ can be used in this case. Another potentially useful identity is
$$ ArcSin(x) = 2*ArcSin(x/(Sqrt(2)*Sqrt(1+Sqrt(1-x^2)))) $$.

Inverse tangent can also be related to inverse sine by
$$ ArcTan(x) = ArcSin(x/Sqrt(1+x^2))$$,
$$ ArcTan(1/x) = ArcSin(1/Sqrt(1+x^2))$$.

Alternatively, the Taylor series can be used for the inverse sine:
$$ ArcSin(x) = x+1/2*x^3/3+(1*3)/(2*4)*x^5/5+(1*3*5)/(2*4*6)*x^7/7 +...$$.

*A continued fraction approximation!of $Tan(x)$

An everywhere convergent continued fraction can be used for the tangent:
$$ Tan(x) = x/(1-x^2/(3-x^2/(5-x^2/(7- ...))))$$.

Hyperbolic and inverse hyperbolic functions are reduced to exponentials and logarithms:
$Cosh(x) = 1/2*(Exp(x)+Exp(-x))$, $Sinh(x) = 1/2*(Exp(x)-Exp(-x))$, $Tanh(x) = Sinh(x)/Cosh(x)$,
$$ArcCosh(x) = Ln(x+Sqrt(x^2-1))$$,
$$ArcSinh(x) = Ln(x+Sqrt(x^2+1))$$,
$$ArcTanh(x) = 1/2*Ln((1+x)/(1-x))$$.

	    Continued fraction for $ArcTan(x)$

*A computation of $ArcTan(x)$!by continued fraction
*A continued fraction approximation!of $ArcTan(x)$

The idea to use continued fraction expansions for {ArcTan} comes from the book [Crenshaw 2000].
In that book the author explains how he got the idea to use continued fraction
expansions to approximate $ArcTan(x)$, given that the Taylor series converges
slowly, and having a hunch that in that case the continued fraction expansion
then converges rapidly. He then proceeds to show that in the case of
$ArcTan(x)$, this advantage is very significant. However, it might not be true for all
slowly converging series.

*REM
One disadvantage of both continued fraction expansions and approximation by 
rational functions, compared to a simple series, is that it is in general not
easy to do the calculation with one step more precision, due to the nature
of the form of the expressions, and the way in which they change when expressions
with one order better precision are considered. The coefficients of the 
terms in the polynomials defining the numerator and the denominator of the
rational function change. This contrasts with a Taylor series expansion, where
each additional term improves the accuracy of the result, and the calculation
can be terminated when sufficient accuracy is achieved.

The convergence of the continued fraction expansion of $ArcTan(x)$ is indeed better than convergence of the Taylor series. Namely, the Taylor series converges only for $Abs(x)<1$ while the continued fraction converges for all $x$. However, the speed of its convergence is not uniform in $x$; the larger the value of $x$, the slower the convergence. The necessary number of terms of the continued fraction is in any case proportional to the required number of digits of precision, but the constant of proportionality depends on $x$.

*A computation of $ArcTan(x)$!by continued fraction!convergence rate

This can be understood by the following argument.
The difference between two partial continued fractions that differ only by one extra last term can be estimated as
$$ Abs(delta) := Abs(b[0]/(a[1]+b[1]/(...+b[n-1]/a[n])) - b[0]/(a[1]+b[1]/(...+b[n]/a[n+1])) ) < (b[0]*b[1]*...*b[n])/ ((a[1]*...*a[n])^2 * a[n+1]) $$.
(This is a conservative estimate that could be improved with more careful analysis.
See also the section on numerical continued fractions.)
For the above continued fraction for $ArcTan(x)$, this directly gives the following estimate,
$$ Abs(delta) < (x^(2*n+1)*(n!)^2)/((2*n+1)*((2*n-1)!!)^2) <=> Pi*(x/2)^(2*n+1) $$.
This formula only gives a meaningful bound if $x<2$, but it is clear that the precision generally becomes worse when $x$ grows. If we need $P$ digits of precision, then, for a given $x$, the number of terms $n$ has to be large enough so that the relative precision is sufficient, i.e. $$delta/ArcTan(x) < 10^(-P)$$. This gives $n > (P*Ln(10))/(Ln(4)-2*Ln(x))$ and for $x=1$, $n>3/2*P$. This estimate is very close for small $x$ and only slightly suboptimal for larger $x$: numerical experimentation shows that for $x<=1$, the required number of terms for $P$ decimal digits is only about $4/3*P$, and for $x<=0.42$, $n$ must be about $3/4*P$. If $x<1$ is very small then one needs a much smaller number of terms $n > P*Ln(10)/(Ln(4)-2*Ln(x))$.
Round-off errors may actually make the result less precise if we use many more terms than needed.

*A computation of $ArcTan(x)$!by Taylor series

If we compare the rate of convergence of the continued fraction for $ArcTan(x)$  with the Taylor series
$$ ArcTan(x) = Sum(n,0,Infinity, ((-1)^n*x^(2*n+1))/(2*n+1)) $$,
we find that the number of terms of the Taylor series needed for $P$ digits is about
$n > (P*Ln(10))/(2*Ln(x))$.
Since the range of $x$ can be reduced to about [0, 0.42] by trigonometric identities, the difference between this and $P*Ln(10)/(Ln(4)-2*Ln(x))$ is never very large.
At most twice as many terms $n$ are needed in the Taylor series as in the continued fraction.
However, a Taylor series can be evaluated efficiently using $O(Sqrt(n))$ long multiplications, while a continued fraction with $n$ terms always requires $n$ divisions.
Therefore, at high enough precision the continued fraction method will be much less efficient than the Taylor series.


	    Which method to use

This remains to be seen.


		Factorials and binomial coefficients

*A factorial

The factorial is defined by $n! := n*(n-1)*...*1$ for integer $n>=1$ and the binomial coefficient is defined by
$$Bin(n,m) := n! / (m! * (n-m)!)$$.

*A double factorial

The "double factorial" $n!! := n*(n-2)*(n-4)*...$ is also useful for some calculations.
For convenience, one defines $0! :=1$, $0!! := 1$, and $(-1)!! := 1$;
with these definitions, the recurrence relations
$$n! *(n+1) = (n+1)!$$,
$$n!! *(n+2) = (n+2)!!$$
are valid also for $n=0$ and $n= -1$.

There are two tasks related to the factorial: the exact integer calculation and an approximate calculation to some floating-point precision. Factorial of $n$ has approximately $n*Ln(n)/Ln(10)$ decimal digits, so an exact calculation is practical only for relatively small $n$. In the current implementation, exact factorials for $n>65535$ are not computed but print an error message advising the user to avoid exact computations of large factorials.
For example, {Internal'LnGammaNum(n+1)} is able to compute $Ln(n!)$ for very large $n$ to any desired floating-point precision.

	    Exact factorials

To compute factorials exactly, we use two direct methods. The first method is
to multiply the numbers $1$, $2$, ..., $n$ in a loop. This method requires $n$
multiplications of short numbers with $P$-digit numbers, where $P=O(n*Ln(n))$
is the number of digits in $n!$. Therefore its complexity is $O(n^2*Ln(n))$.
This factorial routine is implemented in the Yacas core with a small speedup:
consecutive pairs of integers are first multiplied together using platform math
and then multiplied by the accumulator product.

*A factorial!by bisection method

A second method uses a binary tree arrangement of the numbers $1$, $2$, ..., $n$ similar to the recursive sorting routine ("merge-sort"). If we denote by {a *** b} the "partial factorial" product $a*(a+1)*...(b-1)*b$, then the tree-factorial algorithm consists of replacing $n!$ by $1 *** n$ and recursively evaluating
$(1 *** m) * ((m+1) *** n)$ for some integer $m$ near $n/2$. The partial factorials of nearby numbers such as $m***(m+2)$ are evaluated explicitly. The binary tree algorithm requires one multiplication of $P/2$ digit integers at the last step, two $P/4$ digit multiplications at the last-but-one step and so on. There are $O(Ln(n))$ total steps of the recursion. If the cost of multiplication is $M(P) = P^(1+a)*Ln(P)^b$, then one can show that the total cost of the binary tree algorithm is $O(M(P))$ if $a>0$ and $O(M(P)*Ln(n))$ if $a=0$ (which is the best asymptotic multiplication algorithm).

Therefore, the tree method wins over the simple method if the cost of multiplication is lower than quadratic.

The tree method can also be used to compute "double factorials" ($n!!$). This is faster than to use the identities
$$ (2*n)!! = 2^n*n! $$ and
$$ (2*n-1)!! = (2*n)! / (2^n*n!)$$.
Double factorials are used, for instance, in the exact calculation of the Gamma function of half-integer arguments.

*A binomial coefficients



Binomial coefficients $Bin(n,m)$ are found by first selecting the smaller of $m$, $n-m$ and using the identity $Bin(n,m) = Bin(n,n-m)$. Then a partial factorial is used to compute
$$ Bin(n,m)= ((n-m+1)***n) / m! $$.
This is always much faster than computing the three factorials in the definition of $Bin(n,m)$.

	    Approximate factorials

*A Gamma function

A floating-point computation of the factorial may proceed either via Euler's Gamma function, $n! = Gamma(n+1)$,
or by a direct method (multiplying the integers and maintaining a certain floating-point precision).
If the required precision is much less than the number of digits in the exact factorial, then almost all multiplications will be truncated to the precision $P$ and the tree method $O(n*M(P))$ is always slower than the simple method $O(n*P)$.

	    Which method to use

This remains to be seen.

		Classical orthogonal polynomials: general case

*A orthogonal polynomials


A family of orthogonal polynomials is a sequence of polynomials $q_n(x)$, $n=0$, $1$, ... that satisfy the orthogonality condition on some interval [$a$, $b$] with respect to some weight function $rho(x)$:
$$ (Integrate(x,a,b) q_m(x)*q_n(x)*rho(x) ) = 0 $$
for $m!=n$.
The interval [$a$, $b$] can be finite or infinite and the weight function must be real and non-negative on this interval.



In principle, one could choose any (non-negative) weight function $rho(x)$ and any interval [$a$, $b$] and construct the corresponding family of orthogonal polynomials $q_n(x)$.
For example, take $q_0=1$, then take $q_1=x+c$ with unknown $c$ and find such $c$ that $q_0$ and $q_1$ satisfy the orthogonality condition;
this requires solving a linear equation.
Then we can similarly find two unknown coefficients of $q_2$ and so on.
(This is called the Gramm-Schmidt orthogonalization procedure.)

But of course not all weight functions $rho(x)$ and not all intervals [$a$, $b$] are equally interesting.
There are several "classical" families of orthogonal polynomials that have been of use to theoretical and applied science.
The "classical" polynomials are always solutions of a simple second-order differential equation and are always a specific case of some hypergeometric function.

The construction of "classical" polynomials can be described by the following scheme.
The function $rho(x)$ must satisfy the so-called Pearson's equation,
$$ 1/rho(x)*(D(x)rho(x)) = alpha(x)/beta(x) $$,
where the functions $alpha$, $beta$ are of the form
$$ alpha(x)=alpha_0+alpha_1*x $$, 
$$ beta(x)=beta_0+beta_1*x+beta_2*x^2 $$.
Also, the following boundary conditions must be satisfied at both ends $a$, $b$ of the interval,
$$ rho(a)*beta(a)=rho(b)*beta(b)=0 $$.



*A orthogonal polynomials!Rodrigues formula

If the function $rho(x)$ and the interval [$a$, $b$] are chosen in this way, then the corresponding orthogonal polynomials $q_n(x)$ are solutions of the differential equation
$$(Deriv(x)(beta(x)*rho(x)*(Deriv(x)q_n(x))))-n*(alpha_1+(n+1)*beta_2)*q[n]=0$$.
The polynomials $q_n(x)$ are also given by the Rodrigues formula,
$$q_n(x) = A[n]/rho(x)*D(x,n)rho(x)*beta(x)^n$$,
where $A[n]$ is a normalization constant.
It is usual to normalize the polynomials so that
$$(Integrate(x,a,b)(q_n(x))^2*rho(x))=1 $$.
The normalization constant $A[n]$ must be chosen accordingly.

*A orthogonal polynomials!generating function

Finally, there is a formula for the generating function of the polynomials,
$$ G(x,w) = 1/rho(x)*rho(t(x,w))/Abs(1-w*(beta_1+2*beta_2*t(x,w))) $$,
where $t(x,w)$ is the root of $t-x-w*beta(t)=0$ which is nearest to $t=x$ at small $w$.
This function $G(x,w)$ gives the unnormalized polynomials,
$$ G(x,w) = Sum(n,0,Infinity, q_n(x)/n! *w^n) $$,
where
$$ q_n(x) = 1/rho(x)*D(x,n)rho(x)*beta(x)^n $$.

The classical families of (normalized) orthogonal polynomials are obtained in this framework with the following definitions:

*A orthogonal polynomials!classical polynomials

*	The Legendre polynomials $OrthoP(n,x)$:
$ a= -1 $, $b= 1$, $rho(x)=1$,
$ alpha(x) = 0 $,
$ beta(x) = 1-x^2 $,
$ A[n] = (-1)^n/(2^n*n!) $.

*	The Laguerre polynomials $(L^m)_n(x)$:
$ a= 0 $, $b= +Infinity$, $rho(x)=x^m*Exp(-x)$, (here $m> -1$ or else the weight function is not integrable on the interval),
$ alpha(x) = m-x $,
$ beta(x) = x $,
$ A[n] = 1 $.

*	The Hermite polynomials $OrthoH(n,x)$:
$ a= -Infinity $, $b= +Infinity$, $rho(x)=Exp(-x^2)$,
$ alpha(x) = -2*x $,
$ beta(x) = 1 $,
$ A[n] = (-1)^n $.

*	The Chebyshev polynomials of the first kind $OrthoT(n,x)$:
$ a= -1 $, $b= 1$, $rho(x)=1/Sqrt(1-x^2)$,
$ alpha(x) = x $,
$ beta(x) = 1-x^2 $,
$ A[n] = (-1)^n/(2*n)!! $.


The Rodrigues formula or the generating function are not efficient ways to calculate the polynomials.
A better way is to use linear recurrence relations connecting $q[n+1]$ with $q[n]$ and $q[n-1]$.
(These recurrence relations can also be written out in full generality through $alpha(x)$ and $beta(x)$ but we shall save the space.)







There are three computational tasks related to orthogonal polynomials:
*	1. Compute all coefficients of a given polynomial $q_n(x)$ exactly.
The coefficients are rational numbers, but their numerators and denominators usually grow exponentially quickly with $n$, so that at least some coefficients of the $n$-th polynomial are $n$-digit numbers.
The array of coefficients can be obtained using recurrence relations $n$ times.
The required number of operations is proportional to $n^2$
(because we need $n$ coefficients and we have $n$ recurrence relations for each of them)
and to the multiplication time of $n$-digit integers, i.e. $O(n^2*M(n))$.
Sometimes an exact formula for the coefficients is available
(this is the case for the four "classical" families of polynomials above; see the next section).
Then the computation time dramatically drops down to $O(n^2)$ because no recurrences are needed.
*	2. Compute the numerical value of a given polynomial $q_n(x)$ at given $x$, either exactly (at rational $x$) or approximately in floating point.
This requires $O(n*M(n))$ operations for exact computation and $O(n*M(P))$ operations in $P$-digit floating point.
*	3. Compute a series of orthogonal polynomials with given coefficients $f[n]$, i.e. $f(x):=Sum(n,0,N, f[n]*q_n(x))$, at a given $x$.
This task does not actually require computing the polynomials first,
if we use the so-called Clenshaw-Smith procedure
which gives the value of $f(x)$ directly in $n$ iterations.
(See <*below|yacasdoc://Algo/4/10/*> for more explanations.)
The number of operations is again $O(n*M(P))$.















In the next section we shall give some formulae that allow to calculate particular polynomials more efficiently.

		Classical orthogonal polynomials: special cases

*A orthogonal polynomials!by specific formulae

The fastest algorithm available is for Chebyshev (sometimes spelled Tschebyscheff) polynomials $OrthoT(n,x)$, $OrthoU(n,x)$.
The following recurrence relations can be used:
$$ OrthoT(2*n,x) = 2*OrthoT(n,x)^2-1 $$,
$$ OrthoT(2*n+1,x) = 2*OrthoT(n+1,x)*OrthoT(n,x)-x $$,
$$ OrthoU(2*n,x) = 2*OrthoT(n,x)*OrthoU(n,x)-1 $$,
$$ OrthoU(2*n+1,x) = 2*OrthoT(n+1,x)*OrthoU(n,x) $$.
This allows to
compute $OrthoT(n,x)$ and $OrthoU(n,x)$ in time logarithmic in $n$.

There is a way to implement this method without recursion.
The idea is to build the sequence of numbers $n[1]$, $n[2]$, ... that are needed to compute $OrthoT(n,x)$.

For example, to compute $OrthoT(19,x)$ using the second recurrence relation, we need $OrthoT(10,x)$ and $OrthoT(9,x)$.
We can write this chain symbolically as $19 <> c(9,10)$.
For $OrthoT(10,x)$ we need only $OrthoT(5,x)$.
This we can write as $10 <> c(5)$.
Similarly we find: $9 <> c(4,5)$.
Therefore, we can find both $OrthoT(9,x)$ and $OrthoT(10,x)$ if we know $OrthoT(4,x)$ and $OrthoT(5,x)$.
Eventually we find the following chain of pairs:
$$ 19 <> c(9,10) <> c(4,5) <> c(2,3) <> c(1,2) <> c(1) $$.
Therefore, we find that $OrthoT(19,x)$ requires to compute $OrthoT(k,x)$ sequentially for all $k$ that appear in this chain (1,2,3,4,5,9,10).

There are about $2*Ln(n)/Ln(2)$ elements in the chain that leads to the number $n$.
We can generate this chain in a straightforward way by examining the bits in the binary representation of $n$.
Therefore, we find that this method requires no storage and time logarithmic in $n$.
A recursive routine would also take logarithmic time but require logarithmic storage space.

Note that using these recurrence relations we do not obtain any individual coefficients of the Chebyshev polynomials.
This method does not seem very useful for symbolic calculations (with symbolic $x$), because the resulting expressions are rather complicated combinations of nested products.
It is difficult to expand such an expression into powers of $x$ or manipulate it in any other way, except compute a numerical value.
However, these fast recurrences are numerically unstable, so numerical values need to be evaluated with extended working precision.
Currently this method is not used in Yacas, despite its speed.

An alternative method for very large $n$ would be to use the identities
$$ OrthoT(n,x)=Cos(n*ArcCos(x))$$,
$$ OrthoU(n,x)=Sin((n+1)*ArcCos(x))/Sqrt(1-x^2) $$.
The computation will require an extended-precision evaluation of $ArcCos(x)$.

Coefficients for Legendre, Hermite, Laguerre, Chebyshev polynomials can be obtained by explicit formulae. This is faster than using recurrences if we need the entire polynomial symbolically, but still slower than the recurrences for numerical calculations.

*	Chebyshev polynomials of the first and of the second kind:
If $k=Floor(n/2)$, then
$$ OrthoT(n,x) = Sum(i,0,k, (-1)^i*(2*x)^(n-2*i)/(n-i)*Bin(n-i,i)) $$,
$$ OrthoU(n,x) = Sum(i,0,k, (-1)^i*(2*x)^(n-2*i)*Bin(n-i,i)) $$.
Here it is assumed that $n>0$ (the case $n=0$ must be done separately). The summation is over integer values of $i$ such that $0<=2*i<=n$, regardless of whether $n$ is even or odd.

*	Hermite polynomials:
For even $n=2*k$ where $k>=0$, 
$$ OrthoH(n,x) = (-2)^(k)*(n-1)!! *Sum(i,0,k, (x^(2*i)*(-4)^i*k!) / ((2*i)! * (k-i)!)) $$,
and for odd $n=2*k+1$ where $k>=0$,
$$ OrthoH(n,x) = 2*(-2)^(k)*n!! *Sum(i,0,k, (x^(2*i+1)*(-4)^i*k!) / ((2*i+1)! * (k-i)!)) $$.


*	Legendre polynomials:
If $k=Floor(n/2)$, then
$$ OrthoP(n,x) = 2^(-n)*Sum(i,0,k, (-1)^i*x^(n-2*i)*Bin(n,i)*Bin(2*n-2*i,n)) $$.
The summation is over integer values of $i$ such that $0<=2*i<=n$, regardless of whether $n$ is even or odd.

*	Laguerre polynomials:
$$ OrthoL(n,a,x) = Sum(i,0,n,(-x)^i/i! *Bin(n+a,n-i)) $$.
Here the parameter $a$ might be non-integer.
So the binomial coefficient must be defined for non-integer $a$ through the Gamma function instead of factorials, which gives
$$ Bin(n+a,n-i) = ( (n+a)*...*(i+1+a) )/(n-i)! $$.
The result is a rational function of $a$.

In all formulae for the coefficients, there is no need to compute factorials every time:
the next coefficient can be obtained from the previous one by a few short multiplications and divisions.
Therefore this computation costs $O(n^2)$ short operations.

		Series of orthogonal polynomials

*A orthogonal polynomials!Clenshaw-Smith recurrence

If we need to compute a series of orthogonal polynomials with given coefficients $f[n]$, i.e.
$$ f(x):=Sum(n,0,N, f[n]*q_n(x)) $$
at a given $x$, we do not need to compute the orthogonal polynomials separately.
The Clenshaw-Smith recurrence procedure allows to compute the value of the sum directly.

Suppose a family of functions $q_n(x)$,
$n=0$, 1, ... satisfies known recurrence relations of the form
$$ q[n]=A_n(x)* q[n-1]+B_n(x)* q[n-2] $$,
where $A_n(x)$  and $B_n(x)$  are some known functions and $q_0(x)$ and
$q_1(x)$ are known.

The procedure goes as follows [Luke 1975].
First, for convenience, we define $q[-1]:= 0$ and the coefficient $A_1(x)$  so
that $q[1]=A[1]*q[0]$.
This allows us to use the above recurrence relation formally also at $n=1$. 
Then, we take the array of coefficients $f[n]$ and define a backward recurrence
relation
$$ X[N+1]=X[N+2]=0 $$,
$$ X[n]=A[n+1]*X[n+1]+B[n+2]*X[n+2]+f[n] $$,
where $n=N$, $N-1$, ..., 0.
(Note that here we have used the artificially defined coefficient $A[1]$.)
Magically, the value we are looking for is given by 
$$ f(x) = X[0]*q[0] $$.
This happens because we can express
$$ f[n]=X[n]-A[n+1]*X[n+1]-B[n+2]*X[n+2] $$,
for $n=0$, 1, ..., $N$,
regroup the terms in the sum 
$$ f(x) := Sum(n,0,N, f[n]*q_n(x)) $$
to collect $X[n]$, obtain
$$ f(x) = Sum(n,0, N, X[n]*q[n]) - Sum(n,1,N, X[n]*A[n]*q[n-1]) -Sum(n,2,N, X[n]*B[n]*q[n-2]) $$,
and finally
$$ f(x) = X[0]*q[0]+X[1]*(q[1]-A[1]*q[1]) $$
$$ +Sum(n,2,N, X[n]*(q[n]-A[n]*q[n-1]-B[n]*q[n-2])) =X[0]*q[0] $$.

The book [Luke 1975] warns that the recurrence relation for $X[n]$ is not always numerically stable.

Note that in the book there seems to be some confusion as 
to how the coefficient $A[1]$ is defined.
(It is not defined explicitly there.)
Our final formula differs from the formula in [Luke 1975] for this reason.

The Clenshaw-Smith procedure is analogous to the Horner scheme of calculating polynomials.
This procedure can also be generalized for linear recurrence relations having
more than two terms.
The functions $q_0(x)$, $q_1(x)$, $A_n(x)$, and $B_n(x)$
do not actually have to be polynomials for this to work.