1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884
|
// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
/*
Multi-precision division. Here be dragons.
Given u and v, where u is n+m digits, and v is n digits (with no leading zeros),
the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v.
That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x,
and rem = u - quo·v.
Long Division
Division in a computer proceeds the same as long division in elementary school,
but computers are not as good as schoolchildren at following vague directions,
so we have to be much more precise about the actual steps and what can happen.
We work from most to least significant digit of the quotient, doing:
• Guess a digit q, the number of v to subtract from the current
section of u to zero out the topmost digit.
• Check the guess by multiplying q·v and comparing it against
the current section of u, adjusting the guess as needed.
• Subtract q·v from the current section of u.
• Add q to the corresponding section of the result quo.
When all digits have been processed, the final remainder is left in u
and returned as rem.
For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2).
q₂ q₁ q₀
_________________
v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀
↓ ↓ ↓ | |
[u₄ u₃ u₂]| |
- [ q₂·v ]| |
----------- ↓ |
[ rem | u₁]|
- [ q₁·v ]|
----------- ↓
[ rem | u₀]
- [ q₀·v ]
------------
[ rem ]
Instead of creating new storage for the remainders and copying digits from u
as indicated by the arrows, we use u's storage directly as both the source
and destination of the subtractions, so that the remainders overwrite
successive overlapping sections of u as the division proceeds, using a slice
of u to identify the current section. This avoids all the copying as well as
shifting of remainders.
Division of u with n+m digits by v with n digits (in base B) can in general
produce at most m+1 digits, because:
• u < B^(n+m) [B^(n+m) has n+m+1 digits]
• v ≥ B^(n-1) [B^(n-1) is the smallest n-digit number]
• u/v < B^(n+m) / B^(n-1) [divide bounds for u, v]
• u/v < B^(m+1) [simplify]
The first step is special: it takes the top n digits of u and divides them by
the n digits of v, producing the first quotient digit and an n-digit remainder.
In the example, q₂ = ⌊u₄u₃u₂ / v⌋.
The first step divides n digits by n digits to ensure that it produces only a
single digit.
Each subsequent step appends the next digit from u to the remainder and divides
those n+1 digits by the n digits of v, producing another quotient digit and a
new n-digit remainder.
Subsequent steps divide n+1 digits by n digits, an operation that in general
might produce two digits. However, as used in the algorithm, that division is
guaranteed to produce only a single digit. The dividend is of the form
rem·B + d, where rem is a remainder from the previous step and d is a single
digit, so:
• rem ≤ v - 1 [rem is a remainder from dividing by v]
• rem·B ≤ v·B - B [multiply by B]
• d ≤ B - 1 [d is a single digit]
• rem·B + d ≤ v·B - 1 [add]
• rem·B + d < v·B [change ≤ to <]
• (rem·B + d)/v < B [divide by v]
Guess and Check
At each step we need to divide n+1 digits by n digits, but this is for the
implementation of division by n digits, so we can't just invoke a division
routine: we _are_ the division routine. Instead, we guess at the answer and
then check it using multiplication. If the guess is wrong, we correct it.
How can this guessing possibly be efficient? It turns out that the following
statement (let's call it the Good Guess Guarantee) is true.
If
• q = ⌊u/v⌋ where u is n+1 digits and v is n digits,
• q < B, and
• the topmost digit of v = vₙ₋₁ ≥ B/2,
then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.)
That is, if we know the answer has only a single digit and we guess an answer
by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division,
then that guess is at least as large as the correct answer. It is also not
too much larger: it is off by at most two from the correct answer.
Note that in the first step of the overall division, which is an n-by-n-digit
division, the 2-by-1 guess uses an implicit uₙ = 0.
Note that using a 2-by-1-digit division here does not mean calling ourselves
recursively. Instead, we use an efficient direct hardware implementation of
that operation.
Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v.
If a guess q̂ is too big, it will not satisfy this test. Viewed a different way,
the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is
negative, then the guess q̂ is too big.
This gives us a way to compute q. First compute q̂ with 2-by-1-digit division.
Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because
q̂ ≤ q+2.
Scaling Inputs
The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2.
For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off
because the first digit 1 is smaller than B/2 = 5.
We can ensure that v has a large top digit by multiplying both u and v by the
right amount. Continuing the example, if we multiply both 172 and 19 by 3, we
now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough
⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here
to multiply by 4, because that can be done with a shift. Specifically, we can
always count the number of leading zeros i in the first digit of v and then
shift both u and v left by i bits.
Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will
be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder
by the scaling factor (shifting right i bits) when we finish.
Note that these shifts happen before and after the entire division algorithm,
not at each step in the per-digit iteration.
Note the effect of scaling inputs on the size of the possible quotient.
In the scaled u/v, u can gain a digit from scaling; v never does, because we
pick the scaling factor to make v's top digit larger but without overflowing.
If u and v have n+m and n digits after scaling, then:
• u < B^(n+m) [B^(n+m) has n+m+1 digits]
• v ≥ B^n / 2 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2]
• u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v]
• u/v < 2 B^m [simplify]
The quotient can still have m+1 significant digits, but if so the top digit
must be a 1. This provides a different way to handle the first digit of the
result: compare the top n digits of u against v and fill in either a 0 or a 1.
Refining Guesses
Before we check whether u < q̂·v, we can adjust our guess to change it from
q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋.
Although not mentioned above, the Good Guess Guarantee also promises that this
3-by-2-digit division guess is more precise and at most one away from the real
answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done
without n-digit math.
If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to
⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division:
if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced.
Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0,
and
uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂)
[splitting off the bottom digit]
= (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂
[regrouping]
The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁.
If the initial guess returns both q̂ and its remainder r̂, then checking
whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂.
If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by
decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂.
(As before, this fixup is only needed at most twice.)
Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one
away from the correct q, and we've avoided doing any n-digit math.
(If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.)
The final check u < q̂·v and the possible fixup must be done at full precision.
For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess
is not often wrong at all. But still we must do the check. Note that since the
3-by-2 guess is off by at most 1, it can be convenient to perform the final
u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the
subtraction underflows, decremeting q̂ and adding one v back to r is enough to
arrive at the final q, r.
That's the entirety of long division: scale the inputs, and then loop over
each output position, guessing, checking, and correcting the next output digit.
For a 2n-digit number divided by an n-digit number (the worst size-n case for
division complexity), this algorithm uses n+1 iterations, each of which must do
at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of
O(n) time each, so O(n²) time overall.
Recursive Division
For very large inputs, it is possible to improve on the O(n²) algorithm.
Let's call a group of n/2 real digits a (very) “wide digit”. We can run the
standard long division algorithm explained above over the wide digits instead of
the actual digits. This will result in many fewer steps, but the math involved in
each step is more work.
Where basic long division uses a 2-by-1-digit division to guess the initial q̂,
the new algorithm must use a 2-by-1-wide-digit division, which is of course
really an n-by-n/2-digit division. That's OK: if we implement n-digit division
in terms of n/2-digit division, the recursion will terminate when the divisor
becomes small enough to handle with standard long division or even with the
2-by-1 hardware instruction.
For example, here is a sketch of dividing 10 digits by 4, proceeding with
wide digits corresponding to two regular digits. The first step, still special,
must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit
remainder less than v. The middle steps divide 6 digits by 4, guaranteed to
produce two output digits each (one wide digit) with 4-digit remainders.
The final step must use what it has: the 4-digit remainder plus one more,
5 digits to divide by 4.
q₆ q₅ q₄ q₃ q₂ q₁ q₀
_______________________________
v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
↓ ↓ ↓ ↓ ↓ | | | | |
[u₉ u₈ u₇ u₆ u₅]| | | | |
- [ q₆q₅·v ]| | | | |
----------------- ↓ ↓ | | |
[ rem |u₄ u₃]| | |
- [ q₄q₃·v ]| | |
-------------------- ↓ ↓ |
[ rem |u₂ u₁]|
- [ q₂q₁·v ]|
-------------------- ↓
[ rem |u₀]
- [ q₀·v ]
------------------
[ rem ]
An alternative would be to look ahead to how well n/2 divides into n+m and
adjust the first step to use fewer digits as needed, making the first step
more special to make the last step not special at all. For example, using the
same input, we could choose to use only 4 digits in the first step, leaving
a full wide digit for the last step:
q₆ q₅ q₄ q₃ q₂ q₁ q₀
_______________________________
v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
↓ ↓ ↓ ↓ | | | | | |
[u₉ u₈ u₇ u₆]| | | | | |
- [ q₆·v ]| | | | | |
-------------- ↓ ↓ | | | |
[ rem |u₅ u₄]| | | |
- [ q₅q₄·v ]| | | |
-------------------- ↓ ↓ | |
[ rem |u₃ u₂]| |
- [ q₃q₂·v ]| |
-------------------- ↓ ↓
[ rem |u₁ u₀]
- [ q₁q₀·v ]
---------------------
[ rem ]
Today, the code in divRecursiveStep works like the first example. Perhaps in
the future we will make it work like the alternative, to avoid a special case
in the final iteration.
Either way, each step is a 3-by-2-wide-digit division approximated first by
a 2-by-1-wide-digit division, just as we did for regular digits in long division.
Because the actual answer we want is a 3-by-2-wide-digit division, instead of
multiplying q̂·v directly during the fixup, we can use the quick refinement
from long division (an n/2-by-n/2 multiply) to correct q to its actual value
and also compute the remainder (as mentioned above), and then stop after that,
never doing a full n-by-n multiply.
Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add
(not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that
produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee
to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first
digit. These benefits are discussed more after the Good Guess Guarantee proof
below.
How Fast is Recursive Division?
For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over
wide digits, producing two wide digits plus a possible leading regular digit 1,
which can be handled without a recursive call. That is, the algorithm uses two
full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit
multiplication, along with a few n-digit additions and subtractions. The standard
n-by-n-digit multiplication algorithm requires O(n²) time, making the overall
algorithm require time T(n) where
T(n) = 2T(n/2) + O(n) + O(n²)
which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²).
This is not an improvement over regular long division.
When the number of digits n becomes large enough, Karatsuba's algorithm for
multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time.
(Karatsuba multiplication is implemented in func karatsuba in nat.go.)
That makes the overall recursive division algorithm take O(n^1.6) time as well,
which is an improvement, but again only for large enough numbers.
It is not critical to make sure that every recursion does only two recursive
calls. While in general the number of recursive calls can change the time
analysis, in this case doing three calls does not change the analysis:
T(n) = 3T(n/2) + O(n) + O(n^log₂3)
ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking
time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the
division does not hurt the asymptotic performance. Of course, it is likely
still faster in practice to do two.
Proof of the Good Guess Guarantee
Given numbers x, y, let us break them into the quotients and remainders when
divided by some scaling factor S, with the added constraints that the quotient
x/y and the high part of y are both less than some limit T, and that the high
part of y is at least half as big as T.
x₁ = ⌊x/S⌋ y₁ = ⌊y/S⌋
x₀ = x mod S y₀ = y mod S
x = x₁·S + x₀ 0 ≤ x₀ < S x/y < T
y = y₁·S + y₀ 0 ≤ y₀ < S T/2 ≤ y₁ < T
And consider the two truncated quotients:
q = ⌊x/y⌋
q̂ = ⌊x₁/y₁⌋
We will prove that q ≤ q̂ ≤ q+2.
The guarantee makes no real demands on the scaling factor S: it is simply the
magnitude of the digits cut from both x and y to produce x₁ and y₁.
The guarantee makes only limited demands on T: it must be large enough to hold
the quotient x/y, and y₁ must have roughly the same size.
To apply to the earlier discussion of 2-by-1 guesses in long division,
we would choose:
S = Bⁿ⁻¹
T = B
x = u
x₁ = uₙuₙ₋₁
x₀ = uₙ₋₂...u₀
y = v
y₁ = vₙ₋₁
y₀ = vₙ₋₂...u₀
These simpler variables avoid repeating those longer expressions in the proof.
Note also that, by definition, truncating division ⌊x/y⌋ satisfies
x/y - 1 < ⌊x/y⌋ ≤ x/y.
This fact will be used a few times in the proofs.
Proof that q ≤ q̂:
q̂·y₁ = ⌊x₁/y₁⌋·y₁ [by definition, q̂ = ⌊x₁/y₁⌋]
> (x₁/y₁ - 1)·y₁ [x₁/y₁ - 1 < ⌊x₁/y₁⌋]
= x₁ - y₁ [distribute y₁]
So q̂·y₁ > x₁ - y₁.
Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1.
q̂ - q = q̂ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋]
≥ q̂ - x/y [⌊x/y⌋ < x/y]
= (1/y)·(q̂·y - x) [factor out 1/y]
≥ (1/y)·(q̂·y₁·S - x) [y = y₁·S + y₀ ≥ y₁·S]
≥ (1/y)·((x₁ - y₁ + 1)·S - x) [above: q̂·y₁ ≥ x₁ - y₁ + 1]
= (1/y)·(x₁·S - y₁·S + S - x) [distribute S]
= (1/y)·(S - x₀ - y₁·S) [-x = -x₁·S - x₀]
> -y₁·S / y [x₀ < S, so S - x₀ < 0; drop it]
≥ -1 [y₁·S ≤ y]
So q̂ - q > -1.
Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂.
Proof that q̂ ≤ q+2:
x₁/y₁ - x/y = x₁·S/y₁·S - x/y [multiply left term by S/S]
≤ x/y₁·S - x/y [x₁S ≤ x]
= (x/y)·(y/y₁·S - 1) [factor out x/y]
= (x/y)·((y - y₁·S)/y₁·S) [move -1 into y/y₁·S fraction]
= (x/y)·(y₀/y₁·S) [y - y₁·S = y₀]
= (x/y)·(1/y₁)·(y₀/S) [factor out 1/y₁]
< (x/y)·(1/y₁) [y₀ < S, so y₀/S < 1]
≤ (x/y)·(2/T) [y₁ ≥ T/2, so 1/y₁ ≤ 2/T]
< T·(2/T) [x/y < T]
= 2 [T·(2/T) = 2]
So x₁/y₁ - x/y < 2.
q̂ - q = ⌊x₁/y₁⌋ - q [by definition, q̂ = ⌊x₁/y₁⌋]
= ⌊x₁/y₁⌋ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋]
≤ x₁/y₁ - ⌊x/y⌋ [⌊x₁/y₁⌋ ≤ x₁/y₁]
< x₁/y₁ - (x/y - 1) [⌊x/y⌋ > x/y - 1]
= (x₁/y₁ - x/y) + 1 [regrouping]
< 2 + 1 [above: x₁/y₁ - x/y < 2]
= 3
So q̂ - q < 3.
Since q̂ - q is an integer, q̂ - q ≤ 2.
Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore
q̂ - q ≤ 1.
Note also that in the general case 2n-by-n division where we don't know that
x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could
remove the special case first step of long division as long as we allow the
first fixup loop to run up to four times. (Using a simple comparison to decide
whether the first digit is 0 or 1 is still more efficient, though.)
Finally, note that when dividing three leading base-B digits by two (scaled),
we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T.
This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that
⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an
integer. For random x and y, the chance of this is 2/B, or, for large B,
approximately zero. This means that after we produce the 3-by-2 guess in the
long division algorithm, the fixup loop essentially never runs.
In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit
division has exactly the same effect: the probability of needing a fixup is the
same 2/B. Even better, we can allow the general case x/y < 2T and the fixup
probability only grows to 4/B, still essentially zero.
References
There are no great references for implementing long division; thus this comment.
Here are some notes about what to expect from the obvious references.
Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical
reference for long division, but that entire series is highly compressed, never
repeating a necessary fact and leaving important insights to the exercises.
For example, no rationale whatsoever is given for the calculation that extends
q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound.
The proof that the calculation even has the desired effect is left to exercises.
The solutions to those exercises provided at the back of the book are entirely
calculations, still with no explanation as to what is going on or how you would
arrive at the idea of doing those exact calculations. Nowhere is it mentioned
that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the
Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction,
making it difficult to understand how modifications like adding another digit
or adjusting the quotient range affects the overall bound.
All that said, Knuth remains the canonical reference. It is dense but packed
full of information and references, and the proofs are simpler than many other
presentations. The proofs above are reworkings of Knuth's to remove the
arguments by contradiction and add explanations or steps that Knuth omitted.
But beware of errors in older printings. Take the published errata with you.
Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield”
starts with a blunt critique of Knuth's presentation (among others) and then
presents a more detailed and easier to follow treatment of long division,
including an implementation in Pascal. But the algorithm and implementation
work entirely in terms of 3-by-2 division, which is much less useful on modern
hardware than an algorithm using 2-by-1 division. The proofs are a bit too
focused on digit counting and seem needlessly complex, especially compared to
the ones given above.
Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of
implementing division by an n-digit divisor using recursive calls to division
by an n/2-digit divisor, relying on Karatsuba multiplication to yield a
sub-quadratic run time. However, the presentation decisions are made almost
entirely for the purpose of simplifying the run-time analysis, rather than
simplifying the presentation. Instead of a single algorithm that loops over
quotient digits, the paper presents two mutually-recursive algorithms, for
2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n
algorithm.
The proofs in the paper are remarkably complex, especially considering that
the algorithm is at its core just long division on wide digits, so that the
usual long division proofs apply essentially unaltered.
*/
package big
import "math/bits"
// div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v.
// It uses z and z2 as the storage for q and r.
func (z nat) div(z2, u, v nat) (q, r nat) {
if len(v) == 0 {
panic("division by zero")
}
if u.cmp(v) < 0 {
q = z[:0]
r = z2.set(u)
return
}
if len(v) == 1 {
// Short division: long optimized for a single-word divisor.
// In that case, the 2-by-1 guess is all we need at each step.
var r2 Word
q, r2 = z.divW(u, v[0])
r = z2.setWord(r2)
return
}
q, r = z.divLarge(z2, u, v)
return
}
// divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y.
// It uses z as the storage for q.
// Note that y is a single digit (Word), not a big number.
func (z nat) divW(x nat, y Word) (q nat, r Word) {
m := len(x)
switch {
case y == 0:
panic("division by zero")
case y == 1:
q = z.set(x) // result is x
return
case m == 0:
q = z[:0] // result is 0
return
}
// m > 0
z = z.make(m)
r = divWVW(z, 0, x, y)
q = z.norm()
return
}
// modW returns x % d.
func (x nat) modW(d Word) (r Word) {
// TODO(agl): we don't actually need to store the q value.
var q nat
q = q.make(len(x))
return divWVW(q, 0, x, d)
}
// divWVW overwrites z with ⌊x/y⌋, returning the remainder r.
// The caller must ensure that len(z) = len(x).
func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) {
r = xn
if len(x) == 1 {
qq, rr := bits.Div(uint(r), uint(x[0]), uint(y))
z[0] = Word(qq)
return Word(rr)
}
rec := reciprocalWord(y)
for i := len(z) - 1; i >= 0; i-- {
z[i], r = divWW(r, x[i], y, rec)
}
return r
}
// div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn.
// It uses z and u as the storage for q and r.
// The caller must ensure that len(vIn) ≥ 2 (use divW otherwise)
// and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise).
func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) {
n := len(vIn)
m := len(uIn) - n
// Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above).
// vIn is treated as a read-only input (it may be in use by another
// goroutine), so we must make a copy.
// uIn is copied to u.
shift := nlz(vIn[n-1])
vp := getNat(n)
v := *vp
shlVU(v, vIn, shift)
u = u.make(len(uIn) + 1)
u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift)
// The caller should not pass aliased z and u, since those are
// the two different outputs, but correct just in case.
if alias(z, u) {
z = nil
}
q = z.make(m + 1)
// Use basic or recursive long division depending on size.
if n < divRecursiveThreshold {
q.divBasic(u, v)
} else {
q.divRecursive(u, v)
}
putNat(vp)
q = q.norm()
// Undo scaling of remainder.
shrVU(u, u, shift)
r = u.norm()
return q, r
}
// divBasic implements long division as described above.
// It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r.
// q must be large enough to hold ⌊u/v⌋.
func (q nat) divBasic(u, v nat) {
n := len(v)
m := len(u) - n
qhatvp := getNat(n + 1)
qhatv := *qhatvp
// Set up for divWW below, precomputing reciprocal argument.
vn1 := v[n-1]
rec := reciprocalWord(vn1)
// Compute each digit of quotient.
for j := m; j >= 0; j-- {
// Compute the 2-by-1 guess q̂.
// The first iteration must invent a leading 0 for u.
qhat := Word(_M)
var ujn Word
if j+n < len(u) {
ujn = u[j+n]
}
// ujn ≤ vn1, or else q̂ would be more than one digit.
// For ujn == vn1, we set q̂ to the max digit M above.
// Otherwise, we compute the 2-by-1 guess.
if ujn != vn1 {
var rhat Word
qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec)
// Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above.
vn2 := v[n-2]
x1, x2 := mulWW(qhat, vn2)
ujn2 := u[j+n-2]
for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2]
qhat--
prevRhat := rhat
rhat += vn1
// If r̂ overflows, then
// r̂ u[j+n-2]v[n-1] is now definitely > x1 x2.
if rhat < prevRhat {
break
}
// TODO(rsc): No need for a full mulWW.
// x2 += vn2; if x2 overflows, x1++
x1, x2 = mulWW(qhat, vn2)
}
}
// Compute q̂·v.
qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0)
qhl := len(qhatv)
if j+qhl > len(u) && qhatv[n] == 0 {
qhl--
}
// Subtract q̂·v from the current section of u.
// If it underflows, q̂·v > u, which we fix up
// by decrementing q̂ and adding v back.
c := subVV(u[j:j+qhl], u[j:], qhatv)
if c != 0 {
c := addVV(u[j:j+n], u[j:], v)
// If n == qhl, the carry from subVV and the carry from addVV
// cancel out and don't affect u[j+n].
if n < qhl {
u[j+n] += c
}
qhat--
}
// Save quotient digit.
// Caller may know the top digit is zero and not leave room for it.
if j == m && m == len(q) && qhat == 0 {
continue
}
q[j] = qhat
}
putNat(qhatvp)
}
// greaterThan reports whether the two digit numbers x1 x2 > y1 y2.
// TODO(rsc): In contradiction to most of this file, x1 is the high
// digit and x2 is the low digit. This should be fixed.
func greaterThan(x1, x2, y1, y2 Word) bool {
return x1 > y1 || x1 == y1 && x2 > y2
}
// divRecursiveThreshold is the number of divisor digits
// at which point divRecursive is faster than divBasic.
const divRecursiveThreshold = 100
// divRecursive implements recursive division as described above.
// It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r.
// z must be large enough to hold ⌊u/v⌋.
// This function is just for allocating and freeing temporaries
// around divRecursiveStep, the real implementation.
func (z nat) divRecursive(u, v nat) {
// Recursion depth is (much) less than 2 log₂(len(v)).
// Allocate a slice of temporaries to be reused across recursion,
// plus one extra temporary not live across the recursion.
recDepth := 2 * bits.Len(uint(len(v)))
tmp := getNat(3 * len(v))
temps := make([]*nat, recDepth)
z.clear()
z.divRecursiveStep(u, v, 0, tmp, temps)
// Free temporaries.
for _, n := range temps {
if n != nil {
putNat(n)
}
}
putNat(tmp)
}
// divRecursiveStep is the actual implementation of recursive division.
// It adds ⌊u/v⌋ to z and overwrites u with the remainder r.
// z must be large enough to hold ⌊u/v⌋.
// It uses temps[depth] (allocating if needed) as a temporary live across
// the recursive call. It also uses tmp, but not live across the recursion.
func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) {
// u is a subsection of the original and may have leading zeros.
// TODO(rsc): The v = v.norm() is useless and should be removed.
// We know (and require) that v's top digit is ≥ B/2.
u = u.norm()
v = v.norm()
if len(u) == 0 {
z.clear()
return
}
// Fall back to basic division if the problem is now small enough.
n := len(v)
if n < divRecursiveThreshold {
z.divBasic(u, v)
return
}
// Nothing to do if u is shorter than v (implies u < v).
m := len(u) - n
if m < 0 {
return
}
// We consider B digits in a row as a single wide digit.
// (See “Recursive Division” above.)
//
// TODO(rsc): rename B to Wide, to avoid confusion with _B,
// which is something entirely different.
// TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋.
B := n / 2
// Allocate a nat for qhat below.
if temps[depth] == nil {
temps[depth] = getNat(n) // TODO(rsc): Can be just B+1.
} else {
*temps[depth] = temps[depth].make(B + 1)
}
// Compute each wide digit of the quotient.
//
// TODO(rsc): Change the loop to be
// for j := (m+B-1)/B*B; j > 0; j -= B {
// which will make the final step a regular step, letting us
// delete what amounts to an extra copy of the loop body below.
j := m
for j > B {
// Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits).
// First make the 2-by-1-wide-digit guess using a recursive call.
// Then extend the guess to the full 3-by-2 (see “Refining Guesses”).
//
// For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit,
// we use a (2B+1)-by-(B+1) digit, which handles the possibility that
// the result has an extra leading 1 digit as well as guaranteeing
// that the computed q̂ will be off by at most 1 instead of 2.
// s is the number of digits to drop from the 3B- and 2B-digit chunks.
// We drop B-1 to be left with 2B+1 and B+1.
s := (B - 1)
// uu is the up-to-3B-digit section of u we are working on.
uu := u[j-B:]
// Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n].
qhat := *temps[depth]
qhat.clear()
qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps)
qhat = qhat.norm()
// Extend to a 3-by-2 quotient and remainder.
// Because divRecursiveStep overwrote the top part of uu with
// the remainder r̂, the full uu already contains the equivalent
// of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion.
// Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder.
// If that subtraction underflows, q̂·v > u, which we fix up
// by decrementing q̂ and adding v back, same as in long division.
// TODO(rsc): Instead of subtract and fix-up, this code is computing
// q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u.
// But we can do the subtraction directly, as in the comment above
// and in long division, because we know that q̂ is wrong by at most one.
qhatv := tmp.make(3 * n)
qhatv.clear()
qhatv = qhatv.mul(qhat, v[:s])
for i := 0; i < 2; i++ {
e := qhatv.cmp(uu.norm())
if e <= 0 {
break
}
subVW(qhat, qhat, 1)
c := subVV(qhatv[:s], qhatv[:s], v[:s])
if len(qhatv) > s {
subVW(qhatv[s:], qhatv[s:], c)
}
addAt(uu[s:], v[s:], 0)
}
if qhatv.cmp(uu.norm()) > 0 {
panic("impossible")
}
c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv)
if c > 0 {
subVW(uu[len(qhatv):], uu[len(qhatv):], c)
}
addAt(z, qhat, j-B)
j -= B
}
// TODO(rsc): Rewrite loop as described above and delete all this code.
// Now u < (v<<B), compute lower bits in the same way.
// Choose shift = B-1 again.
s := B - 1
qhat := *temps[depth]
qhat.clear()
qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps)
qhat = qhat.norm()
qhatv := tmp.make(3 * n)
qhatv.clear()
qhatv = qhatv.mul(qhat, v[:s])
// Set the correct remainder as before.
for i := 0; i < 2; i++ {
if e := qhatv.cmp(u.norm()); e > 0 {
subVW(qhat, qhat, 1)
c := subVV(qhatv[:s], qhatv[:s], v[:s])
if len(qhatv) > s {
subVW(qhatv[s:], qhatv[s:], c)
}
addAt(u[s:], v[s:], 0)
}
}
if qhatv.cmp(u.norm()) > 0 {
panic("impossible")
}
c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv)
if c > 0 {
c = subVW(u[len(qhatv):], u[len(qhatv):], c)
}
if c > 0 {
panic("impossible")
}
// Done!
addAt(z, qhat.norm(), 0)
}
|