File: README.amdahl

package info (click to toggle)
python-bumps 1.0.0b2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 6,144 kB
  • sloc: python: 23,941; xml: 493; ansic: 373; makefile: 209; sh: 91; javascript: 90
file content (49 lines) | stat: -rw-r--r-- 2,120 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
For large fits across mutliple nodes you may find that the proposal step is a bottleneck.
The compiled DE stepper can speed this up by a factor of 2 compared to the numba
version that is usually used. This might help on large allocations, but not enough
to support it in the automatic build infrastructure.

Update: MSVC is 6x faster than numba on one machine. Need to check performance with and
without compiled on HPC hardware to know if the compiled version is required.

To use the compiled de stepper and bounds checks, first make sure the "random123" library submodule has been checked out

    git clone --branch v1.14.0 https://github.com/DEShawResearch/random123.git bumps/dream/random123

Then, to compile on unix use:

    (cd bumps/dream && cc compiled.c -I ./random123/include/ -O2 -fopenmp -shared -lm -o _compiled.so -fPIC -DMAX_THREADS=64)

On OS/X clang doesn't support OpenMP:

    (cd bumps/dream && cc compiled.c -I ./random123/include/ -O2 -shared -lm -o _compiled.so -fPIC -DMAX_THREADS=64)

MSVC on windows using Visual Studio build tools (2022):

    % set up compiler environment
    "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64
    cd bumps\dream
    cl compiled.c -I .\random123\include /O2 /openmp /LD /GL /Fe_compiled.so

This only works when _compiled.so is in the bumps/dream directory.  If running
from a pip installed version, you will need to fetch the bumps repository:

    $ git clone https://github.com/bumps/bumps.git
    $ cd bumps

Compile as above, then find the bumps install path using the following:

    $ python -c "import bumps.dream; print(bumps.dream.__file__)"
    #dream/path/__init__.py

Copy the compiled module to the install, with the #dream/path printed above:

    $ cp bumps/dream/_compiled.so #dream/path

There is no provision for using _compiled.so in a frozen application.

Run with no more than 64 OMP threads.  If the number of processors is more than 64, then use:

    OMP_NUM_THREADS=64 ./run.py ...

I don't know how OMP_NUM_THREADS behaves if it is larger than the number of processors.