1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
|
; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -show-mc-encoding -verify-machineinstrs < %s | FileCheck %s
;;;==========================================================================;;;
;;; MUBUF LOAD TESTS
;;;==========================================================================;;;
; MUBUF load with an immediate byte offset that fits into 12-bits
; CHECK-LABEL: {{^}}mubuf_load0:
; CHECK: buffer_load_dword v{{[0-9]}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; encoding: [0x04,0x00,0x30,0xe0
define amdgpu_kernel void @mubuf_load0(ptr addrspace(1) %out, ptr addrspace(1) %in) {
entry:
%0 = getelementptr i32, ptr addrspace(1) %in, i64 1
%1 = load i32, ptr addrspace(1) %0
store i32 %1, ptr addrspace(1) %out
ret void
}
; MUBUF load with the largest possible immediate offset
; CHECK-LABEL: {{^}}mubuf_load1:
; CHECK: buffer_load_ubyte v{{[0-9]}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4095 ; encoding: [0xff,0x0f,0x20,0xe0
define amdgpu_kernel void @mubuf_load1(ptr addrspace(1) %out, ptr addrspace(1) %in) {
entry:
%0 = getelementptr i8, ptr addrspace(1) %in, i64 4095
%1 = load i8, ptr addrspace(1) %0
store i8 %1, ptr addrspace(1) %out
ret void
}
; MUBUF load with an immediate byte offset that doesn't fit into 12-bits
; CHECK-LABEL: {{^}}mubuf_load2:
; CHECK: s_movk_i32 [[SOFFSET:s[0-9]+]], 0x1000
; CHECK: buffer_load_dword v{{[0-9]}}, off, s[{{[0-9]+:[0-9]+}}], [[SOFFSET]] ; encoding: [0x00,0x00,0x30,0xe0
define amdgpu_kernel void @mubuf_load2(ptr addrspace(1) %out, ptr addrspace(1) %in) {
entry:
%0 = getelementptr i32, ptr addrspace(1) %in, i64 1024
%1 = load i32, ptr addrspace(1) %0
store i32 %1, ptr addrspace(1) %out
ret void
}
; MUBUF load with a 12-bit immediate offset and a register offset
; CHECK-LABEL: {{^}}mubuf_load3:
; CHECK-NOT: ADD
; CHECK: buffer_load_dword v{{[0-9]}}, v[{{[0-9]+:[0-9]+}}], s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:4 ; encoding: [0x04,0x80,0x30,0xe0
define amdgpu_kernel void @mubuf_load3(ptr addrspace(1) %out, ptr addrspace(1) %in, i64 %offset) {
entry:
%0 = getelementptr i32, ptr addrspace(1) %in, i64 %offset
%1 = getelementptr i32, ptr addrspace(1) %0, i64 1
%2 = load i32, ptr addrspace(1) %1
store i32 %2, ptr addrspace(1) %out
ret void
}
; CHECK-LABEL: {{^}}soffset_max_imm:
; CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 64 offen glc
define amdgpu_gs void @soffset_max_imm(ptr addrspace(4) inreg, ptr addrspace(4) inreg, ptr addrspace(4) inreg, ptr addrspace(4) inreg, i32 inreg, i32 inreg, i32, i32, i32, i32, i32, i32, i32, i32) {
main_body:
%tmp1 = load ptr addrspace(8), ptr addrspace(4) %0
%tmp2 = shl i32 %6, 2
%tmp3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) %tmp1, i32 %tmp2, i32 64, i32 1)
%tmp4 = add i32 %6, 16
%tmp1.4xi32 = bitcast ptr addrspace(8) %tmp1 to ptr addrspace(8)
call void @llvm.amdgcn.raw.ptr.tbuffer.store.i32(i32 %tmp3, ptr addrspace(8) %tmp1.4xi32, i32 %tmp4, i32 %4, i32 68, i32 3)
ret void
}
; Make sure immediates that aren't inline constants don't get folded into
; the soffset operand.
; FIXME: for this test we should be smart enough to shift the immediate into
; the offset field.
; CHECK-LABEL: {{^}}soffset_no_fold:
; CHECK: s_movk_i32 [[SOFFSET:s[0-9]+]], 0x41
; CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], [[SOFFSET]] offen glc
define amdgpu_gs void @soffset_no_fold(ptr addrspace(4) inreg, ptr addrspace(4) inreg, ptr addrspace(4) inreg, ptr addrspace(4) inreg, i32 inreg, i32 inreg, i32, i32, i32, i32, i32, i32, i32, i32) {
main_body:
%tmp1 = load ptr addrspace(8), ptr addrspace(4) %0
%tmp2 = shl i32 %6, 2
%tmp3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) %tmp1, i32 %tmp2, i32 65, i32 1)
%tmp4 = add i32 %6, 16
%tmp1.4xi32 = bitcast ptr addrspace(8) %tmp1 to ptr addrspace(8)
call void @llvm.amdgcn.raw.ptr.tbuffer.store.i32(i32 %tmp3, ptr addrspace(8) %tmp1.4xi32, i32 %tmp4, i32 %4, i32 68, i32 3)
ret void
}
;;;==========================================================================;;;
;;; MUBUF STORE TESTS
;;;==========================================================================;;;
; MUBUF store with an immediate byte offset that fits into 12-bits
; CHECK-LABEL: {{^}}mubuf_store0:
; CHECK: buffer_store_dword v{{[0-9]}}, off, s[{{[0-9]:[0-9]}}], 0 offset:4 ; encoding: [0x04,0x00,0x70,0xe0
define amdgpu_kernel void @mubuf_store0(ptr addrspace(1) %out) {
entry:
%0 = getelementptr i32, ptr addrspace(1) %out, i64 1
store i32 0, ptr addrspace(1) %0
ret void
}
; MUBUF store with the largest possible immediate offset
; CHECK-LABEL: {{^}}mubuf_store1:
; CHECK: buffer_store_byte v{{[0-9]}}, off, s[{{[0-9]:[0-9]}}], 0 offset:4095 ; encoding: [0xff,0x0f,0x60,0xe0
define amdgpu_kernel void @mubuf_store1(ptr addrspace(1) %out) {
entry:
%0 = getelementptr i8, ptr addrspace(1) %out, i64 4095
store i8 0, ptr addrspace(1) %0
ret void
}
; MUBUF store with an immediate byte offset that doesn't fit into 12-bits
; CHECK-LABEL: {{^}}mubuf_store2:
; CHECK: s_movk_i32 [[SOFFSET:s[0-9]+]], 0x1000
; CHECK: buffer_store_dword v{{[0-9]}}, off, s[{{[0-9]:[0-9]}}], [[SOFFSET]] ; encoding: [0x00,0x00,0x70,0xe0
define amdgpu_kernel void @mubuf_store2(ptr addrspace(1) %out) {
entry:
%0 = getelementptr i32, ptr addrspace(1) %out, i64 1024
store i32 0, ptr addrspace(1) %0
ret void
}
; MUBUF store with a 12-bit immediate offset and a register offset
; CHECK-LABEL: {{^}}mubuf_store3:
; CHECK-NOT: ADD
; CHECK: buffer_store_dword v{{[0-9]}}, v[{{[0-9]:[0-9]}}], s[{{[0-9]:[0-9]}}], 0 addr64 offset:4 ; encoding: [0x04,0x80,0x70,0xe0
define amdgpu_kernel void @mubuf_store3(ptr addrspace(1) %out, i64 %offset) {
entry:
%0 = getelementptr i32, ptr addrspace(1) %out, i64 %offset
%1 = getelementptr i32, ptr addrspace(1) %0, i64 1
store i32 0, ptr addrspace(1) %1
ret void
}
; CHECK-LABEL: {{^}}store_sgpr_ptr:
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0
define amdgpu_kernel void @store_sgpr_ptr(ptr addrspace(1) %out) {
store i32 99, ptr addrspace(1) %out, align 4
ret void
}
; CHECK-LABEL: {{^}}store_sgpr_ptr_offset:
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:40
define amdgpu_kernel void @store_sgpr_ptr_offset(ptr addrspace(1) %out) {
%out.gep = getelementptr i32, ptr addrspace(1) %out, i32 10
store i32 99, ptr addrspace(1) %out.gep, align 4
ret void
}
; CHECK-LABEL: {{^}}store_sgpr_ptr_large_offset:
; CHECK: s_mov_b32 [[SOFFSET:s[0-9]+]], 0x20000
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, [[SOFFSET]]
define amdgpu_kernel void @store_sgpr_ptr_large_offset(ptr addrspace(1) %out) {
%out.gep = getelementptr i32, ptr addrspace(1) %out, i32 32768
store i32 99, ptr addrspace(1) %out.gep, align 4
ret void
}
; CHECK-LABEL: {{^}}store_sgpr_ptr_large_offset_atomic:
; CHECK: s_mov_b32 [[SOFFSET:s[0-9]+]], 0x20000
; CHECK: buffer_atomic_add v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, [[SOFFSET]]
define amdgpu_kernel void @store_sgpr_ptr_large_offset_atomic(ptr addrspace(1) %out) {
%gep = getelementptr i32, ptr addrspace(1) %out, i32 32768
%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 5 seq_cst
ret void
}
; CHECK-LABEL: {{^}}store_vgpr_ptr:
; CHECK: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64
define amdgpu_kernel void @store_vgpr_ptr(ptr addrspace(1) %out) {
%tid = call i32 @llvm.amdgcn.workitem.id.x() readnone
%out.gep = getelementptr i32, ptr addrspace(1) %out, i32 %tid
store i32 99, ptr addrspace(1) %out.gep, align 4
ret void
}
declare i32 @llvm.amdgcn.workitem.id.x() #1
declare void @llvm.amdgcn.raw.ptr.tbuffer.store.i32(i32, ptr addrspace(8), i32, i32, i32 immarg, i32 immarg) #2
declare i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8), i32, i32, i32 immarg) #3
attributes #0 = { nounwind readonly }
attributes #1 = { nounwind readnone speculatable willreturn }
attributes #2 = { nounwind willreturn writeonly }
attributes #3 = { nounwind readonly willreturn }
attributes #4 = { readnone }
|