1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
|
# SPECFEM Sample
This sample contains a dummy example from a spectral-element stiffness kernel taken from [SPECFEM3D_GLOBE](https://github.com/geodynamics/specfem3d_globe).
It is based on a 4th-order, spectral-element stiffness kernel for simulations of elastic wave propagation through the Earth. Matrix sizes used are (25,5), (5,25) and (5,5) determined by different cut-planes through a three dimensional (5,5,5)-element with a total of 125 GLL points.
## Usage Step-by-Step
This example needs the LIBXSMM library to be built with static kernels, using MNK="5 25" (for matrix size (5,25), (25,5) and (5,5)).
### Build LIBXSMM
#### General Default Compilation
In LIBXSMM root directory, compile the library with:
```bash
make MNK="5 25" ALPHA=1 BETA=0
```
#### Additional Compilation Examples
Compilation using only single precision version and aggressive optimization:
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3
```
For Sandy Bridge CPUs:
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=1
```
For Haswell CPUs:
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=2
```
For Knights Corner (KNC) (and thereby creating a Sandy Bridge version):
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=1 \
OFFLOAD=1 KNC=1
```
Installing libraries into a sub-directory workstation/:
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=1 \
OFFLOAD=1 KNC=1 \
PREFIX=workstation/ install-minimal
```
### Build SpecFEM example code
For default CPU host:
```bash
cd sample/specfem
make
```
For Knights Corner (KNC):
```bash
cd sample/specfem
make KNC=1
```
Additionally, adding some specific Fortran compiler flags, for example:
```bash
cd sample/specfem
make FCFLAGS="-O3 -fopenmp" [...]
```
Note that steps 1 and 2 could be shortened by specifying a "specfem" make target in the LIBXSMM root directory:
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=1 specfem
```
For Knights Corner, this would need two steps:
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=1 OFFLOAD=1 KNC=1
make OPT=3 specfem_mic
```
## Run the Performance Test
For default CPU host:
```bash
./specfem.sh
```
For Knights Corner (KNC):
```bash
./specfem.sh -mic
```
## Results
Using Intel Compiler suite: icpc 15.0.2, icc 15.0.2, and ifort 15.0.2.
### Sandy Bridge - Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Library compilation by (root directory):
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=1
```
Single threaded example run:
```bash
cd sample/specfem
make; OMP_NUM_THREADS=1 ./specfem.sh
```
Output:
```bash
===============================================================
average over 15 repetitions
timing with Deville loops = 0.1269
timing with unrolled loops = 0.1737 / speedup = -36.87 %
timing with LIBXSMM dispatch = 0.1697 / speedup = -33.77 %
timing with LIBXSMM prefetch = 0.1611 / speedup = -26.98 %
timing with LIBXSMM static = 0.1392 / speedup = -9.70 %
===============================================================
```
### Haswell - Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Library compilation by (root directory):
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 AVX=2
```
Single threaded example run:
```bash
cd sample/specfem
make; OMP_NUM_THREADS=1 ./specfem.sh
```
Output:
```bash
===============================================================
average over 15 repetitions
timing with Deville loops = 0.1028
timing with unrolled loops = 0.1385 / speedup = -34.73 %
timing with LIBXSMM dispatch = 0.1408 / speedup = -37.02 %
timing with LIBXSMM prefetch = 0.1327 / speedup = -29.07 %
timing with LIBXSMM static = 0.1151 / speedup = -11.93 %
===============================================================
```
Multi-threaded example run:
```bash
cd sample/specfem
make OPT=3; OMP_NUM_THREADS=24 ./specfem.sh
```
Output:
```bash
OpenMP information:
number of threads = 24
[...]
===============================================================
average over 15 repetitions
timing with Deville loops = 0.0064
timing with unrolled loops = 0.0349 / speedup = -446.71 %
timing with LIBXSMM dispatch = 0.0082 / speedup = -28.34 %
timing with LIBXSMM prefetch = 0.0076 / speedup = -19.59 %
timing with LIBXSMM static = 0.0068 / speedup = -5.78 %
===============================================================
```
### Knights Corner - Intel Xeon Phi B1PRQ-5110P/5120D
Library compilation by (root directory):
```bash
make MNK="5 25" ALPHA=1 BETA=0 PRECISION=1 OPT=3 OFFLOAD=1 KNC=1
```
Multi-threaded example run:
```bash
cd sample/specfem
make FCFLAGS="-O3 -fopenmp -warn" OPT=3 KNC=1; ./specfem.sh -mic
```
Output:
```bash
OpenMP information:
number of threads = 236
[...]
===============================================================
average over 15 repetitions
timing with Deville loops = 0.0164
timing with unrolled loops = 0.6982 / speedup = -4162.10 %
timing with LIBXSMM dispatch = 0.0170 / speedup = -3.89 %
timing with LIBXSMM static = 0.0149 / speedup = 9.22 %
===============================================================
```
|