1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
|
--------------------------------
LAMMPS ACCELERATOR LIBRARY
--------------------------------
W. Michael Brown (ORNL)
Trung Dac Nguyen (ORNL/Northwestern)
Nitin Dhamankar (Intel)
Axel Kohlmeyer (Temple)
Peng Wang (NVIDIA)
Anders Hafreager (UiO)
V. Nikolskiy (HSE)
Maurice de Koning (Unicamp/Brazil)
Rodolfo Paula Leite (Unicamp/Brazil)
Steve Plimpton (SNL)
Inderaj Bains (NVIDIA)
------------------------------------------------------------------------------
This directory has source files to build a library that LAMMPS links against
when using the GPU package.
This library must be built with a C++ compiler along with CUDA, HIP, or OpenCL
before LAMMPS is built, so LAMMPS can link against it.
This library, libgpu.a, provides routines for acceleration of certain
LAMMPS styles and neighbor list builds using CUDA, OpenCL, or ROCm HIP.
Pair styles supported by this library are marked in the list of Pair style
potentials with a "g". See the online version at:
https://docs.lammps.org/Commands_pair.html
In addition the (plain) pppm kspace style is supported as well.
------------------------------------------------------------------------------
Installing oneAPI, OpenCl, CUDA, or ROCm
------------------------------------------------------------------------------
The easiest approach is to use the linux package manger to perform the
installation from Intel, NVIDIA, etc. repositories. All are available for
free. The oneAPI installation includes Intel optimized MPI and C++ compilers,
along with many libraries. Alternatively, Intel OpenCL can also be installed
separately from the Intel repository.
NOTE: Installation of the CUDA SDK is not required, only the CUDA toolkit.
See:
https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit.html
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
https://github.com/RadeonOpenCompute/ROCm
------------------------------------------------------------------------------
Build Intro
------------------------------------------------------------------------------
See the LAMMPS manual:
https://docs.lammps.org/Build_extras.html#gpu
------------------------------------------------------------------------------
ALL PREPROCESSOR OPTIONS (For Advanced Users)
------------------------------------------------------------------------------
The following preprocessor options are available, some of which can be set
with the CMake build.
_SINGLE_SINGLE Build library for single precision mode (-DGPU_PREC=single)
_SINGLE_DOUBLE Build library for mixed precision mode (-DGPU_PREC=mixed)
_DOUBLE_DOUBLE Build library for double precision mode (-DGPU_PREC=double)
GERYON_NUMA_FISSION Accelerators with main memory NUMA are split into
multiple virtual accelerators for each NUMA node
GPU_CAST Casting performed on GPU, untested recently.
LAL_DISABLE_PREFETCH Disable prefetch in kernels
LAL_NO_BLOCK_REDUCE Use host for energy/virial accumulation
LAL_SERIALIZE_INIT Force serialization of initialization and compilation
for multiple MPI tasks sharing the same accelerator.
Some accelerator API implementations have had issues
with temporary file conflicts in the past.
LAL_USE_OMP=0 Disable OpenMP in lib, regardless of compiler setting
LAL_USE_OMP_SIMD=0 Disable OpenMP SIMD in lib, regardless of compiler set
LAL_USE_OLD_NEIGHBOR Use old neighbor list algorithm
MPI_GERYON Library should use MPI_Abort for unhandled errors
UCL_NO_EXIT LAMMPS should handle errors instead of Geryon lib
UCL_DEBUG Debug build for Geryon (-DGPU_DEBUG=on)
USE_CUDPP Enable GPU binning in neighbor builds (not recommended)
THREE_CONCURRENT Concurrent 3-body kernels in separate queues, untested
For CUDA builds:
CUDA_MPS_SUPPORT Do not generate errors for exclusive mode for CUDA
effectively supporting CUDA Multi-process service (MPS)
(-DCUDA_MPS_SUPPORT=on)
For OpenCL builds:
GERYON_OCL_FLUSH For OpenCL, flush queue after every enqueue
GERYON_KERNEL_DUMP Dump all compiled OpenCL programs with compiler
flags and build logs (-DGPU_DEBUG=on)
GERYON_FORCE_SHARED_MAIN_MEM_ON Should only be used for builds where the
accelerator is guaranteed to share physical
main memory with the host (e.g. integrated
GPU or CPU device). Default behavior is to
auto-detect. Impacts OpenCL only.
GERYON_FORCE_SHARED_MAIN_MEM_OFF Should only be used for builds where the
accelerator is guaranteed to have discrete
physical main memory vs the host (discrete
GPU card). Default behavior is to
auto-detect. Impacts OpenCL only.
LAL_NO_OCL_EV_JIT Turn off JIT specialization for kernels in OpenCL
LAL_OCL_EXTRA_ARGS Supply extra args for OpenCL compiler delimited with :
For HIP builds:
USE_HIP_DEVICE_SORT Enable GPU binning for HIP builds
(-DHIP_USE_DEVICE_SORT=yes)
------------------------------------------------------------------------------
DEVICE QUERY
------------------------------------------------------------------------------
The GPU library includes binaries to check for available GPUs and their
properties. It is a good idea to run this on first use to make sure the
system and build is setup properly. Additionally, the GPU numbering for
specific selection of devices should be taking from this output. The GPU
library may split some accelerators into separate virtual accelerators for
efficient use with MPI.
After the LAMMPS build succeeds, a binary is generated in the build folder,
with which one can query the devices for OpenCL:
./ocl_get_devices
for CUDA:
./nvc_get_devices
and for ROCm HIP:
./hip_get_devices
------------------------------------------------------------------------------
References for Details
------------------------------------------------------------------------------
Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing
Molecular Dynamics on Hybrid High Performance Computers - Short Range
Forces. Computer Physics Communications. 2011. 182: p. 898-911.
and
Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing
Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle
Particle-Mesh. Computer Physics Communications. 2012. 183: p. 449-459.
and
Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High
Performance Computers - Three-Body Potentials. Computer Physics Communications.
2013. 184: p. 2785–2793.
|