1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541
|
#
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# $Id: mpi_x86_os2.s,v 1.2 2012/04/25 14:49:50 gerv%gerv.net Exp $
#
.data
.align 4
#
# -1 means to call _s_mpi_is_sse to determine if we support sse
# instructions.
# 0 means to use x86 instructions
# 1 means to use sse2 instructions
.type is_sse,@object
.size is_sse,4
is_sse: .long -1
#
# sigh, handle the difference between -fPIC and not PIC
# default to pic, since this file seems to be exclusively
# linux right now (solaris uses mpi_i86pc.s and windows uses
# mpi_x86_asm.c)
#
#.ifndef NO_PIC
#.macro GET var,reg
# movl \var@GOTOFF(%ebx),\reg
#.endm
#.macro PUT reg,var
# movl \reg,\var@GOTOFF(%ebx)
#.endm
#.else
.macro GET var,reg
movl \var,\reg
.endm
.macro PUT reg,var
movl \reg,\var
.endm
#.endif
.text
# ebp - 36: caller's esi
# ebp - 32: caller's edi
# ebp - 28:
# ebp - 24:
# ebp - 20:
# ebp - 16:
# ebp - 12:
# ebp - 8:
# ebp - 4:
# ebp + 0: caller's ebp
# ebp + 4: return address
# ebp + 8: a argument
# ebp + 12: a_len argument
# ebp + 16: b argument
# ebp + 20: c argument
# registers:
# eax:
# ebx: carry
# ecx: a_len
# edx:
# esi: a ptr
# edi: c ptr
.globl _s_mpv_mul_d
.type _s_mpv_mul_d,@function
_s_mpv_mul_d:
GET is_sse,%eax
cmp $0,%eax
je _s_mpv_mul_d_x86
jg _s_mpv_mul_d_sse2
call _s_mpi_is_sse2
PUT %eax,is_sse
cmp $0,%eax
jg _s_mpv_mul_d_sse2
_s_mpv_mul_d_x86:
push %ebp
mov %esp,%ebp
sub $28,%esp
push %edi
push %esi
push %ebx
movl $0,%ebx # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
mov 20(%ebp),%edi
cmp $0,%ecx
je 2f # jmp if a_len == 0
mov 8(%ebp),%esi # esi = a
cld
1:
lodsl # eax = [ds:esi]; esi += 4
mov 16(%ebp),%edx # edx = b
mull %edx # edx:eax = Phi:Plo = a_i * b
add %ebx,%eax # add carry (%ebx) to edx:eax
adc $0,%edx
mov %edx,%ebx # high half of product becomes next carry
stosl # [es:edi] = ax; edi += 4;
dec %ecx # --a_len
jnz 1b # jmp if a_len != 0
2:
mov %ebx,0(%edi) # *c = carry
pop %ebx
pop %esi
pop %edi
leave
ret
nop
_s_mpv_mul_d_sse2:
push %ebp
mov %esp,%ebp
push %edi
push %esi
psubq %mm2,%mm2 # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
movd 16(%ebp),%mm1 # mm1 = b
mov 20(%ebp),%edi
cmp $0,%ecx
je 6f # jmp if a_len == 0
mov 8(%ebp),%esi # esi = a
cld
5:
movd 0(%esi),%mm0 # mm0 = *a++
add $4,%esi
pmuludq %mm1,%mm0 # mm0 = b * *a++
paddq %mm0,%mm2 # add the carry
movd %mm2,0(%edi) # store the 32bit result
add $4,%edi
psrlq $32, %mm2 # save the carry
dec %ecx # --a_len
jnz 5b # jmp if a_len != 0
6:
movd %mm2,0(%edi) # *c = carry
emms
pop %esi
pop %edi
leave
ret
nop
# ebp - 36: caller's esi
# ebp - 32: caller's edi
# ebp - 28:
# ebp - 24:
# ebp - 20:
# ebp - 16:
# ebp - 12:
# ebp - 8:
# ebp - 4:
# ebp + 0: caller's ebp
# ebp + 4: return address
# ebp + 8: a argument
# ebp + 12: a_len argument
# ebp + 16: b argument
# ebp + 20: c argument
# registers:
# eax:
# ebx: carry
# ecx: a_len
# edx:
# esi: a ptr
# edi: c ptr
.globl _s_mpv_mul_d_add
.type _s_mpv_mul_d_add,@function
_s_mpv_mul_d_add:
GET is_sse,%eax
cmp $0,%eax
je _s_mpv_mul_d_add_x86
jg _s_mpv_mul_d_add_sse2
call _s_mpi_is_sse2
PUT %eax,is_sse
cmp $0,%eax
jg _s_mpv_mul_d_add_sse2
_s_mpv_mul_d_add_x86:
push %ebp
mov %esp,%ebp
sub $28,%esp
push %edi
push %esi
push %ebx
movl $0,%ebx # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
mov 20(%ebp),%edi
cmp $0,%ecx
je 11f # jmp if a_len == 0
mov 8(%ebp),%esi # esi = a
cld
10:
lodsl # eax = [ds:esi]; esi += 4
mov 16(%ebp),%edx # edx = b
mull %edx # edx:eax = Phi:Plo = a_i * b
add %ebx,%eax # add carry (%ebx) to edx:eax
adc $0,%edx
mov 0(%edi),%ebx # add in current word from *c
add %ebx,%eax
adc $0,%edx
mov %edx,%ebx # high half of product becomes next carry
stosl # [es:edi] = ax; edi += 4;
dec %ecx # --a_len
jnz 10b # jmp if a_len != 0
11:
mov %ebx,0(%edi) # *c = carry
pop %ebx
pop %esi
pop %edi
leave
ret
nop
_s_mpv_mul_d_add_sse2:
push %ebp
mov %esp,%ebp
push %edi
push %esi
psubq %mm2,%mm2 # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
movd 16(%ebp),%mm1 # mm1 = b
mov 20(%ebp),%edi
cmp $0,%ecx
je 16f # jmp if a_len == 0
mov 8(%ebp),%esi # esi = a
cld
15:
movd 0(%esi),%mm0 # mm0 = *a++
add $4,%esi
pmuludq %mm1,%mm0 # mm0 = b * *a++
paddq %mm0,%mm2 # add the carry
movd 0(%edi),%mm0
paddq %mm0,%mm2 # add the carry
movd %mm2,0(%edi) # store the 32bit result
add $4,%edi
psrlq $32, %mm2 # save the carry
dec %ecx # --a_len
jnz 15b # jmp if a_len != 0
16:
movd %mm2,0(%edi) # *c = carry
emms
pop %esi
pop %edi
leave
ret
nop
# ebp - 8: caller's esi
# ebp - 4: caller's edi
# ebp + 0: caller's ebp
# ebp + 4: return address
# ebp + 8: a argument
# ebp + 12: a_len argument
# ebp + 16: b argument
# ebp + 20: c argument
# registers:
# eax:
# ebx: carry
# ecx: a_len
# edx:
# esi: a ptr
# edi: c ptr
.globl _s_mpv_mul_d_add_prop
.type _s_mpv_mul_d_add_prop,@function
_s_mpv_mul_d_add_prop:
GET is_sse,%eax
cmp $0,%eax
je _s_mpv_mul_d_add_prop_x86
jg _s_mpv_mul_d_add_prop_sse2
call _s_mpi_is_sse2
PUT %eax,is_sse
cmp $0,%eax
jg _s_mpv_mul_d_add_prop_sse2
_s_mpv_mul_d_add_prop_x86:
push %ebp
mov %esp,%ebp
sub $28,%esp
push %edi
push %esi
push %ebx
movl $0,%ebx # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
mov 20(%ebp),%edi
cmp $0,%ecx
je 21f # jmp if a_len == 0
cld
mov 8(%ebp),%esi # esi = a
20:
lodsl # eax = [ds:esi]; esi += 4
mov 16(%ebp),%edx # edx = b
mull %edx # edx:eax = Phi:Plo = a_i * b
add %ebx,%eax # add carry (%ebx) to edx:eax
adc $0,%edx
mov 0(%edi),%ebx # add in current word from *c
add %ebx,%eax
adc $0,%edx
mov %edx,%ebx # high half of product becomes next carry
stosl # [es:edi] = ax; edi += 4;
dec %ecx # --a_len
jnz 20b # jmp if a_len != 0
21:
cmp $0,%ebx # is carry zero?
jz 23f
mov 0(%edi),%eax # add in current word from *c
add %ebx,%eax
stosl # [es:edi] = ax; edi += 4;
jnc 23f
22:
mov 0(%edi),%eax # add in current word from *c
adc $0,%eax
stosl # [es:edi] = ax; edi += 4;
jc 22b
23:
pop %ebx
pop %esi
pop %edi
leave
ret
nop
_s_mpv_mul_d_add_prop_sse2:
push %ebp
mov %esp,%ebp
push %edi
push %esi
push %ebx
psubq %mm2,%mm2 # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
movd 16(%ebp),%mm1 # mm1 = b
mov 20(%ebp),%edi
cmp $0,%ecx
je 26f # jmp if a_len == 0
mov 8(%ebp),%esi # esi = a
cld
25:
movd 0(%esi),%mm0 # mm0 = *a++
movd 0(%edi),%mm3 # fetch the sum
add $4,%esi
pmuludq %mm1,%mm0 # mm0 = b * *a++
paddq %mm0,%mm2 # add the carry
paddq %mm3,%mm2 # add *c++
movd %mm2,0(%edi) # store the 32bit result
add $4,%edi
psrlq $32, %mm2 # save the carry
dec %ecx # --a_len
jnz 25b # jmp if a_len != 0
26:
movd %mm2,%ebx
cmp $0,%ebx # is carry zero?
jz 28f
mov 0(%edi),%eax
add %ebx, %eax
stosl
jnc 28f
27:
mov 0(%edi),%eax # add in current word from *c
adc $0,%eax
stosl # [es:edi] = ax; edi += 4;
jc 27b
28:
emms
pop %ebx
pop %esi
pop %edi
leave
ret
nop
# ebp - 20: caller's esi
# ebp - 16: caller's edi
# ebp - 12:
# ebp - 8: carry
# ebp - 4: a_len local
# ebp + 0: caller's ebp
# ebp + 4: return address
# ebp + 8: pa argument
# ebp + 12: a_len argument
# ebp + 16: ps argument
# ebp + 20:
# registers:
# eax:
# ebx: carry
# ecx: a_len
# edx:
# esi: a ptr
# edi: c ptr
.globl _s_mpv_sqr_add_prop
.type _s_mpv_sqr_add_prop,@function
_s_mpv_sqr_add_prop:
GET is_sse,%eax
cmp $0,%eax
je _s_mpv_sqr_add_prop_x86
jg _s_mpv_sqr_add_prop_sse2
call _s_mpi_is_sse2
PUT %eax,is_sse
cmp $0,%eax
jg _s_mpv_sqr_add_prop_sse2
_s_mpv_sqr_add_prop_x86:
push %ebp
mov %esp,%ebp
sub $12,%esp
push %edi
push %esi
push %ebx
movl $0,%ebx # carry = 0
mov 12(%ebp),%ecx # a_len
mov 16(%ebp),%edi # edi = ps
cmp $0,%ecx
je 31f # jump if a_len == 0
cld
mov 8(%ebp),%esi # esi = pa
30:
lodsl # %eax = [ds:si]; si += 4;
mull %eax
add %ebx,%eax # add "carry"
adc $0,%edx
mov 0(%edi),%ebx
add %ebx,%eax # add low word from result
mov 4(%edi),%ebx
stosl # [es:di] = %eax; di += 4;
adc %ebx,%edx # add high word from result
movl $0,%ebx
mov %edx,%eax
adc $0,%ebx
stosl # [es:di] = %eax; di += 4;
dec %ecx # --a_len
jnz 30b # jmp if a_len != 0
31:
cmp $0,%ebx # is carry zero?
jz 34f
mov 0(%edi),%eax # add in current word from *c
add %ebx,%eax
stosl # [es:edi] = ax; edi += 4;
jnc 34f
32:
mov 0(%edi),%eax # add in current word from *c
adc $0,%eax
stosl # [es:edi] = ax; edi += 4;
jc 32b
34:
pop %ebx
pop %esi
pop %edi
leave
ret
nop
_s_mpv_sqr_add_prop_sse2:
push %ebp
mov %esp,%ebp
push %edi
push %esi
push %ebx
psubq %mm2,%mm2 # carry = 0
mov 12(%ebp),%ecx # ecx = a_len
mov 16(%ebp),%edi
cmp $0,%ecx
je 36f # jmp if a_len == 0
mov 8(%ebp),%esi # esi = a
cld
35:
movd 0(%esi),%mm0 # mm0 = *a
movd 0(%edi),%mm3 # fetch the sum
add $4,%esi
pmuludq %mm0,%mm0 # mm0 = sqr(a)
paddq %mm0,%mm2 # add the carry
paddq %mm3,%mm2 # add the low word
movd 4(%edi),%mm3
movd %mm2,0(%edi) # store the 32bit result
psrlq $32, %mm2
paddq %mm3,%mm2 # add the high word
movd %mm2,4(%edi) # store the 32bit result
psrlq $32, %mm2 # save the carry.
add $8,%edi
dec %ecx # --a_len
jnz 35b # jmp if a_len != 0
36:
movd %mm2,%ebx
cmp $0,%ebx # is carry zero?
jz 38f
mov 0(%edi),%eax
add %ebx, %eax
stosl
jnc 38f
37:
mov 0(%edi),%eax # add in current word from *c
adc $0,%eax
stosl # [es:edi] = ax; edi += 4;
jc 37b
38:
emms
pop %ebx
pop %esi
pop %edi
leave
ret
nop
#
# Divide 64-bit (Nhi,Nlo) by 32-bit divisor, which must be normalized
# so its high bit is 1. This code is from NSPR.
#
# mp_err _s_mpv_div_2dx1d(mp_digit Nhi, mp_digit Nlo, mp_digit divisor,
# mp_digit *qp, mp_digit *rp)
# esp + 0: Caller's ebx
# esp + 4: return address
# esp + 8: Nhi argument
# esp + 12: Nlo argument
# esp + 16: divisor argument
# esp + 20: qp argument
# esp + 24: rp argument
# registers:
# eax:
# ebx: carry
# ecx: a_len
# edx:
# esi: a ptr
# edi: c ptr
#
.globl _s_mpv_div_2dx1d
.type _s_mpv_div_2dx1d,@function
_s_mpv_div_2dx1d:
push %ebx
mov 8(%esp),%edx
mov 12(%esp),%eax
mov 16(%esp),%ebx
div %ebx
mov 20(%esp),%ebx
mov %eax,0(%ebx)
mov 24(%esp),%ebx
mov %edx,0(%ebx)
xor %eax,%eax # return zero
pop %ebx
ret
nop
|