darktable-bench is a simple script to apply standardized processing to
a standard image for the purpose of comparing performance between
systems.  It reports the average time taken for the processing over
multiple runs, as well as a performance metric which is useful for
comparison (a system with a rating twice as high as another will let
you export twice as many images in a given time).

Performance numbers will only be directly comparable when using the
same version of darktable, the same image, and the same sidecar file.
Unless you override the image or sidecar, they will remain the same.
Comparisons between different darktable versions will also reflect
performance changes between the versions (see the "Comparative
Performance" section at the end of this file).

Note that on a slow machine, it could easily take three to five
minutes to run the benchmark.

Usage
-----

In the simplest invocation, you can run the program directly from the
top-level directory of your darktable source hierarchy:

   src/tests/benchmark/darktable-bench

This will then run the default sidecar (v3.6) on the default image
(mire1.cr2 from the integration test suite) using the darktable-cli in
the build directory, or the darktable-cli on your search path.


The following commandline options are available:

   -i / --image FILE
   		specify the image to use instead of mire1.cr2

   -v / --version VER
   		use alternate sidecar darktable-bench-VER.xmp

   -p / --program PATH
   		specify the program to execute

   -x / --xmp NAME
   		override the base name of the sidecar file and use
   		NAME-VER.xmp instead

   -r N / --reps N
   		run the development N times instead of 3

   -t N / --threads N
   		tell darktable-cli to run with N threads (default is
		the number of hardware threads)

   -C / --cpuonly
		disable OpenCL GPU acceleration and run using the CPU
		only

   -T PATH / --tempdir PATH
   		store temporary files in a scratch directory under
   		PATH (default /tmp)

   -I FILE / --iopstats FILE
   		store per-IOP average run time to FILE

   --verbose
		run verbosely

Report
------

darktable-bench prints the time each development took, as well as the
average and the throughput rating, which is the approximate number of
images of this size with this processing that could be exported per
hour.  (You will likely see greater throughput on your own images, as
the benchmark processing is deliberately very compute-intensive.)  The
reported times are the total pixelpipe processing time reported by
darktable-cli, and the pixelpipe time plus load/save time.  As
darktable-cli currently does not report the time needed to write the
final result, darktable-bench assumes that the save time is the same
as the load time.

Sample output

      Preparing...done
	   run # 1:   8.595 pixpipe,    8.817 total
	   run # 2:   8.572 pixpipe,    8.796 total
	   run # 3:   8.548 pixpipe,    8.770 total

      darktable 3.2.1 ::: benchmark v3.4 ::: image mire1.cr2
      Average pixelpipe processing time:      8.572 seconds
      Average overall processing time:        8.794 seconds
      Throughput rating (higher is better):   409.4 (CPU only)

If you specified the number of threads to use (for example, to check
whether hyperthreading helps or hinders performance), that number will
be included in the report

      darktable 3.5.0+2252~g0fffe6150 ::: benchmark v3.6 ::: image mire1.cr2
      Number of threads used:                    32
      Average pixelpipe processing time:      5.381 seconds
      Average overall processing time:        5.599 seconds
      Throughput rating (higher is better):   642.9 (CPU only)


Structure
---------

darktable-bench		 : the benchmarking script (Python 3)

darktable-bench-null.xmp : a sidecar file with minimal processing,
			   used to warm up disk caches

darktable-bench-3.6.xmp  : the default benchmarking sidecar
darktable-bench-3.4.xmp  : alternate sidecar for older version

../integration/images/mire1.cr2 : the default benchmarking image


How to add a new benchmark
--------------------------

1. open an image in darktable and apply whatever processing you desire

2. copy the generated .xmp sidecar into src/tests/benchmark under the
   name 'darktable-bench-XYZ.xmp'

3. run

   darktable-bench -v XYZ

   to apply your new sidecar to the standard image from the
   integration test suite (src/tests/integration/images/mire1.cr2).


Comparative Performance
-----------------------

Reported performance numbers depend on the hardware, darktable
version, image, and sidecar file used.  The following are some example
throughput ratings using the standard image.

Thruput Sidecar dt           Hardware
~410*   3.4     3.2.1        32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
~645    3.4     3.4.0        32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
~690    3.4     3.6.0        32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
720     3.4     3.7.0+440    32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
713     3.4     3.9.0+1630   32-core AMD Threadripper 3970X, 64GB PC3600, no GPU

659     3.6     3.7.0+440    32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
661     3.6     3.9.0+1630   32-core AMD Threadripper 3970X, 64GB PC3600, no GPU

644	3.8	3.7.0+1370   32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
666     3.8     3.9.0+1630   32-core AMD Threadripper 3970X, 64GB PC3600, no GPU
640	3.8	4.3.0+923    32-core AMD Threadripper 3970X, 64GB PC3600, no GPU

640	4.2	4.3.0+923    32-core AMD Threadripper 3970X, 64GB PC3600, no GPU

[*] darktable 3.2.1 using the v3.4 sidecar skips two modules which
  didn't yet exist, so this number is actually over-reporting the
  comparative performance.