1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
|
/* memset/bzero -- set memory area to CH/0
Optimized version for x86-64.
Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Andreas Jaeger <aj@suse.de>.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
#include "_glibc_inc.h"
/* BEWARE: `#ifdef memset' means that memset is redefined as `bzero' */
#define BZERO_P (defined memset)
/* This is somehow experimental and could made dependend on the cache
size. */
#define LARGE $120000
.text
ENTRY (memset)
#if BZERO_P
mov %rsi,%rdx /* Adjust parameter. */
xorl %esi,%esi /* Fill with 0s. */
#endif
cmp $0x7,%rdx /* Check for small length. */
mov %rdi,%rcx /* Save ptr as return value. */
jbe 7f
#if BZERO_P
mov %rsi,%r8 /* Just copy 0. */
#else
/* Populate 8 bit data to full 64-bit. */
movabs $0x0101010101010101,%r8
movzbl %sil,%eax
imul %rax,%r8
#endif
test $0x7,%edi /* Check for alignment. */
jz 2f
/* Next 3 insns are 9 bytes total, make sure we decode them in one go */
.p2align 4,,9
1:
/* Align ptr to 8 byte. */
mov %sil,(%rcx)
dec %rdx
inc %rcx
test $0x7,%cl
jnz 1b
2: /* Check for really large regions. */
mov %rdx,%rax
shr $0x6,%rax
je 4f
cmp LARGE, %rdx
jae 11f
/* Next 3 insns are 11 bytes total, make sure we decode them in one go */
.p2align 4,,11
3:
/* Fill 64 bytes. */
mov %r8,(%rcx)
mov %r8,0x8(%rcx)
mov %r8,0x10(%rcx)
mov %r8,0x18(%rcx)
mov %r8,0x20(%rcx)
mov %r8,0x28(%rcx)
mov %r8,0x30(%rcx)
mov %r8,0x38(%rcx)
add $0x40,%rcx
dec %rax
jne 3b
4: /* Fill final bytes. */
and $0x3f,%edx
mov %rdx,%rax
shr $0x3,%rax
je 6f
5: /* First in chunks of 8 bytes. */
mov %r8,(%rcx)
add $0x8,%rcx
dec %rax
jne 5b
6:
and $0x7,%edx
7:
test %rdx,%rdx
je 9f
8: /* And finally as bytes (up to 7). */
mov %sil,(%rcx)
inc %rcx
dec %rdx
jne 8b
9:
#if BZERO_P
/* nothing */
#else
/* Load result (only if used as memset). */
mov %rdi,%rax /* start address of destination is result */
#endif
retq
/* Next 3 insns are 14 bytes total, make sure we decode them in one go */
.p2align 4,,14
11:
/* Fill 64 bytes without polluting the cache. */
/* We could use movntdq %xmm0,(%rcx) here to further
speed up for large cases but let's not use XMM registers. */
movnti %r8,(%rcx)
movnti %r8,0x8(%rcx)
movnti %r8,0x10(%rcx)
movnti %r8,0x18(%rcx)
movnti %r8,0x20(%rcx)
movnti %r8,0x28(%rcx)
movnti %r8,0x30(%rcx)
movnti %r8,0x38(%rcx)
add $0x40,%rcx
dec %rax
jne 11b
jmp 4b
END (memset)
#if !BZERO_P
libc_hidden_def(memset)
#endif
|