File: README

package info (click to toggle)
lammps 20251210%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 465,808 kB
sloc: cpp: 1,031,565; python: 26,771; ansic: 8,808; f90: 7,302; sh: 5,316; perl: 4,171; fortran: 2,442; xml: 1,613; makefile: 1,119; objc: 238; lisp: 188; yacc: 58; csh: 16; awk: 14; tcl: 6; javascript: 2
file content (155 lines) | stat: -rw-r--r-- 7,414 bytes
                  --------------------------------
                     LAMMPS ACCELERATOR LIBRARY
                  --------------------------------

                       W. Michael Brown (ORNL)
                        Trung Dac Nguyen (ORNL/Northwestern)
                        Nitin Dhamankar (Intel)
                       Axel Kohlmeyer (Temple)
                          Peng Wang (NVIDIA)
                        Anders Hafreager (UiO)
                          V. Nikolskiy (HSE)
                   Maurice de Koning (Unicamp/Brazil)
                  Rodolfo Paula Leite (Unicamp/Brazil)
                         Steve Plimpton (SNL)
                        Inderaj Bains (NVIDIA)


------------------------------------------------------------------------------

This directory has source files to build a library that LAMMPS links against
when using the GPU package.

This library must be built with a C++ compiler along with CUDA, HIP, or OpenCL
before LAMMPS is built, so LAMMPS can link against it.

This library, libgpu.a, provides routines for acceleration of certain
LAMMPS styles and neighbor list builds using CUDA, OpenCL, or ROCm HIP.

Pair styles supported by this library are marked in the list of Pair style
potentials with a "g". See the online version at:

https://docs.lammps.org/Commands_pair.html

In addition the (plain) pppm kspace style is supported as well.

------------------------------------------------------------------------------
                 Installing oneAPI, OpenCl, CUDA, or ROCm
------------------------------------------------------------------------------
The easiest approach is to use the linux package manger to perform the
installation from Intel, NVIDIA, etc. repositories. All are available for
free. The oneAPI installation includes Intel optimized MPI and C++ compilers,
along with many libraries. Alternatively, Intel OpenCL can also be installed
separately from the Intel repository.

NOTE: Installation of the CUDA SDK is not required, only the CUDA toolkit.

See:

https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit.html

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

https://github.com/RadeonOpenCompute/ROCm

------------------------------------------------------------------------------
                              Build Intro
------------------------------------------------------------------------------

See the LAMMPS manual:
  https://docs.lammps.org/Build_extras.html#gpu

------------------------------------------------------------------------------
                   ALL PREPROCESSOR OPTIONS (For Advanced Users)
------------------------------------------------------------------------------

The following preprocessor options are available, some of which can be set
with the CMake build.

_SINGLE_SINGLE          Build library for single precision mode (-DGPU_PREC=single)
_SINGLE_DOUBLE          Build library for mixed precision mode (-DGPU_PREC=mixed)
_DOUBLE_DOUBLE          Build library for double precision mode (-DGPU_PREC=double)
GERYON_NUMA_FISSION     Accelerators with main memory NUMA are split into
                        multiple virtual accelerators for each NUMA node
GPU_CAST                Casting performed on GPU, untested recently.
LAL_DISABLE_PREFETCH    Disable prefetch in kernels
LAL_NO_BLOCK_REDUCE     Use host for energy/virial accumulation
LAL_SERIALIZE_INIT      Force serialization of initialization and compilation
                        for multiple MPI tasks sharing the same accelerator.
                        Some accelerator API implementations have had issues
                        with temporary file conflicts in the past.
LAL_USE_OMP=0           Disable OpenMP in lib, regardless of compiler setting
LAL_USE_OMP_SIMD=0      Disable OpenMP SIMD in lib, regardless of compiler set
LAL_USE_OLD_NEIGHBOR    Use old neighbor list algorithm
MPI_GERYON              Library should use MPI_Abort for unhandled errors
UCL_NO_EXIT             LAMMPS should handle errors instead of Geryon lib
UCL_DEBUG               Debug build for Geryon (-DGPU_DEBUG=on)
USE_CUDPP               Enable GPU binning in neighbor builds (not recommended)
THREE_CONCURRENT        Concurrent 3-body kernels in separate queues, untested

For CUDA builds:

CUDA_MPS_SUPPORT        Do not generate errors for exclusive mode for CUDA
                        effectively supporting  CUDA Multi-process service (MPS)
                        (-DCUDA_MPS_SUPPORT=on)

For OpenCL builds:

GERYON_OCL_FLUSH        For OpenCL, flush queue after every enqueue
GERYON_KERNEL_DUMP      Dump all compiled OpenCL programs with compiler
                        flags and build logs (-DGPU_DEBUG=on)
GERYON_FORCE_SHARED_MAIN_MEM_ON      Should only be used for builds where the
                                     accelerator is guaranteed to share physical
                                     main memory with the host (e.g. integrated
                                     GPU or CPU device). Default behavior is to
                                     auto-detect. Impacts OpenCL only.
GERYON_FORCE_SHARED_MAIN_MEM_OFF     Should only be used for builds where the
                                     accelerator is guaranteed to have discrete
                                     physical main memory vs the host (discrete
                                     GPU card). Default behavior is to
                                     auto-detect. Impacts OpenCL only.
LAL_NO_OCL_EV_JIT       Turn off JIT specialization for kernels in OpenCL
LAL_OCL_EXTRA_ARGS      Supply extra args for OpenCL compiler delimited with :

For HIP builds:

USE_HIP_DEVICE_SORT     Enable GPU binning for HIP builds
                        (-DHIP_USE_DEVICE_SORT=yes)

------------------------------------------------------------------------------
                              DEVICE QUERY
------------------------------------------------------------------------------
The GPU library includes binaries to check for available GPUs and their
properties. It is a good idea to run this on first use to make sure the
system and build is setup properly. Additionally, the GPU numbering for
specific selection of devices should be taking from this output. The GPU
library may split some accelerators into separate virtual accelerators for
efficient use with MPI.

After the LAMMPS build succeeds, a binary is generated in the build folder,
with which one can query the devices for OpenCL:
  ./ocl_get_devices
for CUDA:
  ./nvc_get_devices
and for ROCm HIP:
  ./hip_get_devices

------------------------------------------------------------------------------
                           References for Details
------------------------------------------------------------------------------

Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing
Molecular Dynamics on Hybrid High Performance Computers - Short Range
Forces. Computer Physics Communications. 2011. 182: p. 898-911.

and

Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing
Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle
Particle-Mesh. Computer Physics Communications. 2012. 183: p. 449-459.

and

Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High
Performance Computers - Three-Body Potentials. Computer Physics Communications.
2013. 184: p. 2785–2793.