File: GetPutBenchmarks.md

package info (click to toggle)
rocksdb 9.10.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 46,052 kB
  • sloc: cpp: 500,768; java: 42,992; ansic: 9,789; python: 8,373; perl: 5,822; sh: 4,921; makefile: 2,386; asm: 550; xml: 342
file content (204 lines) | stat: -rw-r--r-- 17,359 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# RocksDB Get Performance Benchmarks

Results associated with [Improve Java API `get()` performance by reducing copies](https://github.com/facebook/rocksdb/pull/10970)

## Build/Run

Mac
```
make clean jclean
DEBUG_LEVEL=0 make -j12 rocksdbjava
(cd java/target; cp rocksdbjni-7.10.0-osx.jar rocksdbjni-7.10.0-SNAPSHOT-osx.jar)
mvn install:install-file -Dfile=./java/target/rocksdbjni-7.10.0-SNAPSHOT-osx.jar -DgroupId=org.rocksdb -DartifactId=rocksdbjni -Dversion=7.10.0-SNAPSHOT -Dpackaging=jar
```

Linux
```
make clean jclean
DEBUG_LEVEL=0 make -j12 rocksdbjava
(cd java/target; cp rocksdbjni-7.10.0-linux64.jar rocksdbjni-7.10.0-SNAPSHOT-linux64.jar)
mvn install:install-file -Dfile=./java/target/rocksdbjni-7.10.0-SNAPSHOT-linux64.jar -DgroupId=org.rocksdb -DartifactId=rocksdbjni -Dversion=7.10.0-SNAPSHOT -Dpackaging=jar
```

Build jmh test package, on either platform
```
pushd java/jmh
mvn clean package
```

A quick test run, just as a sanity check, using a small number of keys, would be
```
java -jar target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -p keyCount=1000 -p keySize=128 -p valueSize=32768 -p columnFamilyTestType="no_column_family" GetBenchmarks
```
The long performance run (as big as we can make it on our Ubuntu box without filling the disk)
```
java -jar target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -p keyCount=1000,50000 -p keySize=128 -p valueSize=1024,16384 -p columnFamilyTestType="1_column_family","20_column_families" GetBenchmarks.get GetBenchmarks.preallocatedByteBufferGet GetBenchmarks.preallocatedGet
```

## Results (Ubuntu, big runs)

NB - we have removed some test results we initially observed on Mac which were not later reproducible.

These take 3-4 hours
```
java -jar target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -p keyCount=1000,50000 -p keySize=128 -p valueSize=1024,16384 -p columnFamilyTestType="1_column_family","20_column_families" GetBenchmarks.get GetBenchmarks.preallocatedByteBufferGet GetBenchmarks.preallocatedGet
```
It's clear that all `get()` variants have noticeably improved performance, though not the spectacular gains of the M1.
### With fixes for all of the `get()` instances

The tests which use methods which have had performance improvements applied are:
```java
get()
preallocatedGet()
preallocatedByteBufferGet()
```

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt        Score       Error  Units
GetBenchmarks.get                               1_column_family        1000        128         1024  thrpt   25   935648.793 ± 22879.910  ops/s
GetBenchmarks.get                               1_column_family        1000        128        16384  thrpt   25   204366.301 ±  1326.570  ops/s
GetBenchmarks.get                               1_column_family       50000        128         1024  thrpt   25   693451.990 ± 19822.720  ops/s
GetBenchmarks.get                               1_column_family       50000        128        16384  thrpt   25    50473.768 ±   497.335  ops/s
GetBenchmarks.get                            20_column_families        1000        128         1024  thrpt   25   550118.874 ± 14289.009  ops/s
GetBenchmarks.get                            20_column_families        1000        128        16384  thrpt   25   120545.549 ±   648.280  ops/s
GetBenchmarks.get                            20_column_families       50000        128         1024  thrpt   25   235671.353 ±  2231.195  ops/s
GetBenchmarks.get                            20_column_families       50000        128        16384  thrpt   25    12463.887 ±  1950.746  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family        1000        128         1024  thrpt   25  1196026.040 ± 35435.729  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family        1000        128        16384  thrpt   25   403252.655 ±  3287.054  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family       50000        128         1024  thrpt   25   829965.448 ± 16945.452  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family       50000        128        16384  thrpt   25    63798.042 ±  1292.858  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families        1000        128         1024  thrpt   25   724557.253 ± 12710.828  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families        1000        128        16384  thrpt   25   176846.615 ±  1121.644  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families       50000        128         1024  thrpt   25   263553.764 ±  1304.243  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families       50000        128        16384  thrpt   25    14721.693 ±  2574.240  ops/s
GetBenchmarks.preallocatedGet                   1_column_family        1000        128         1024  thrpt   25  1093947.765 ± 42846.276  ops/s
GetBenchmarks.preallocatedGet                   1_column_family        1000        128        16384  thrpt   25   391629.913 ±  4039.965  ops/s
GetBenchmarks.preallocatedGet                   1_column_family       50000        128         1024  thrpt   25   769332.958 ± 24180.749  ops/s
GetBenchmarks.preallocatedGet                   1_column_family       50000        128        16384  thrpt   25    61712.038 ±   423.494  ops/s
GetBenchmarks.preallocatedGet                20_column_families        1000        128         1024  thrpt   25   694684.465 ±  5484.205  ops/s
GetBenchmarks.preallocatedGet                20_column_families        1000        128        16384  thrpt   25   172383.593 ±   841.679  ops/s
GetBenchmarks.preallocatedGet                20_column_families       50000        128         1024  thrpt   25   257447.351 ±  1388.667  ops/s
GetBenchmarks.preallocatedGet                20_column_families       50000        128        16384  thrpt   25    13418.522 ±  2418.619  ops/s

### Baseline (no fixes)

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt        Score       Error  Units
GetBenchmarks.get                               1_column_family        1000        128         1024  thrpt   25   866745.224 ±  8834.629  ops/s
GetBenchmarks.get                               1_column_family        1000        128        16384  thrpt   25   184332.195 ±  2304.217  ops/s
GetBenchmarks.get                               1_column_family       50000        128         1024  thrpt   25   666794.288 ± 16150.684  ops/s
GetBenchmarks.get                               1_column_family       50000        128        16384  thrpt   25    47221.788 ±   433.165  ops/s
GetBenchmarks.get                            20_column_families        1000        128         1024  thrpt   25   551513.636 ±  7763.681  ops/s
GetBenchmarks.get                            20_column_families        1000        128        16384  thrpt   25   113117.720 ±   580.738  ops/s
GetBenchmarks.get                            20_column_families       50000        128         1024  thrpt   25   238675.555 ±  1758.978  ops/s
GetBenchmarks.get                            20_column_families       50000        128        16384  thrpt   25    11639.390 ±  1459.765  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family        1000        128         1024  thrpt   25  1153617.917 ± 26350.028  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family        1000        128        16384  thrpt   25   401710.334 ±  4324.539  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family       50000        128         1024  thrpt   25   809384.073 ± 13833.871  ops/s
GetBenchmarks.preallocatedByteBufferGet         1_column_family       50000        128        16384  thrpt   25    59279.005 ±   443.207  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families        1000        128         1024  thrpt   25   715466.403 ±  6591.375  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families        1000        128        16384  thrpt   25   175279.163 ±   910.923  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families       50000        128         1024  thrpt   25   263295.180 ±   856.456  ops/s
GetBenchmarks.preallocatedByteBufferGet      20_column_families       50000        128        16384  thrpt   25    14001.928 ±  2462.067  ops/s
GetBenchmarks.preallocatedGet                   1_column_family        1000        128         1024  thrpt   25  1072866.854 ± 27030.592  ops/s
GetBenchmarks.preallocatedGet                   1_column_family        1000        128        16384  thrpt   25   383950.853 ±  4510.654  ops/s
GetBenchmarks.preallocatedGet                   1_column_family       50000        128         1024  thrpt   25   764395.469 ± 10097.417  ops/s
GetBenchmarks.preallocatedGet                   1_column_family       50000        128        16384  thrpt   25    56851.330 ±   388.029  ops/s
GetBenchmarks.preallocatedGet                20_column_families        1000        128         1024  thrpt   25   668518.593 ±  9764.117  ops/s
GetBenchmarks.preallocatedGet                20_column_families        1000        128        16384  thrpt   25   171309.695 ±   875.895  ops/s
GetBenchmarks.preallocatedGet                20_column_families       50000        128         1024  thrpt   25   256057.801 ±   954.621  ops/s
GetBenchmarks.preallocatedGet                20_column_families       50000        128        16384  thrpt   25    13319.380 ±  2126.654  ops/s

### Comparison

It does at least look best when the data is cached. That is to say, smallest number of column families, and least keys.

GetBenchmarks.get                               1_column_family        1000        128        16384  thrpt   25   204366.301 ±  1326.570  ops/s
GetBenchmarks.get                               1_column_family        1000        128        16384  thrpt   25   184332.195 ±  2304.217  ops/s

GetBenchmarks.get                               1_column_family       50000        128        16384  thrpt   25    50473.768 ±   497.335  ops/s
GetBenchmarks.get                               1_column_family       50000        128        16384  thrpt   25    47221.788 ±   433.165  ops/s

GetBenchmarks.get                            20_column_families        1000        128        16384  thrpt   25   120545.549 ±   648.280  ops/s
GetBenchmarks.get                            20_column_families        1000        128        16384  thrpt   25   113117.720 ±   580.738  ops/s

GetBenchmarks.get                            20_column_families       50000        128        16384  thrpt   25    12463.887 ±  1950.746  ops/s
GetBenchmarks.get                            20_column_families       50000        128        16384  thrpt   25    11639.390 ±  1459.765  ops/s

### Baseline
25 minute run, small number of keys
```
java -jar target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -p keyCount=1000 -p keySize=128 -p valueSize=32768 -p columnFamilyTestType="no_column_families" GetBenchmarks.get GetBenchmarks.preallocatedByteBufferGet GetBenchmarks.preallocatedGet
```

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score     Error  Units
GetBenchmarks.get                            no_column_families        1000        128        32768  thrpt   25  32344.908 ± 296.651  ops/s
GetBenchmarks.preallocatedByteBufferGet      no_column_families        1000        128        32768  thrpt   25  45266.968 ± 424.514  ops/s
GetBenchmarks.preallocatedGet                no_column_families        1000        128        32768  thrpt   25  43531.088 ± 291.785  ops/s

### Optimized

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score     Error  Units
GetBenchmarks.get                            no_column_families        1000        128        32768  thrpt   25  37463.716 ± 235.744  ops/s
GetBenchmarks.preallocatedByteBufferGet      no_column_families        1000        128        32768  thrpt   25  48946.105 ± 466.463  ops/s
GetBenchmarks.preallocatedGet                no_column_families        1000        128        32768  thrpt   25  47143.624 ± 576.763  ops/s

## Conclusion

The performance improvement is real.

# Put Performance Benchmarks

Results associated with [Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge()](https://github.com/facebook/rocksdb/pull/11019)

This work was not designed specifically as a performance optimization, but we want to confirm that it has not regressed what it has changed, and to provide
a baseline for future possible performance work.

## Build/Run

Building is as above. Running is a different invocation of the same JMH jar.
```
java -jar target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -p keyCount=1000,50000 -p keySize=128 -p valueSize=1024,32768 -p columnFamilyTestType="no_column_family" PutBenchmarks
```

## Before Changes

These results were generated in a private branch with the `PutBenchmarks` from the PR backported onto the current *main*.

Benchmark                     (bufferListSize)  (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score      Error  Units
PutBenchmarks.put                           16        no_column_family        1000        128         1024  thrpt   25  76670.200 ± 2555.248  ops/s
PutBenchmarks.put                           16        no_column_family        1000        128        32768  thrpt   25   3913.692 ±  225.690  ops/s
PutBenchmarks.put                           16        no_column_family       50000        128         1024  thrpt   25  74479.589 ±  988.361  ops/s
PutBenchmarks.put                           16        no_column_family       50000        128        32768  thrpt   25   4070.800 ±  194.838  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family        1000        128         1024  thrpt   25  72150.853 ± 1744.216  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family        1000        128        32768  thrpt   25   3896.646 ±  188.629  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family       50000        128         1024  thrpt   25  71753.287 ± 1053.904  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family       50000        128        32768  thrpt   25   3928.503 ±  264.443  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family        1000        128         1024  thrpt   25  72595.105 ± 1027.258  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family        1000        128        32768  thrpt   25   3890.100 ±  199.131  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family       50000        128         1024  thrpt   25  70878.133 ± 1181.601  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family       50000        128        32768  thrpt   25   3863.181 ±  215.888  ops/s

## After Changes

These results were generated on the PR branch.

Benchmark                     (bufferListSize)  (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score      Error  Units
PutBenchmarks.put                           16        no_column_family        1000        128         1024  thrpt   25  75178.751 ± 2644.775  ops/s
PutBenchmarks.put                           16        no_column_family        1000        128        32768  thrpt   25   3937.175 ±  257.039  ops/s
PutBenchmarks.put                           16        no_column_family       50000        128         1024  thrpt   25  74375.519 ± 1776.654  ops/s
PutBenchmarks.put                           16        no_column_family       50000        128        32768  thrpt   25   4013.413 ±  257.706  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family        1000        128         1024  thrpt   25  71418.303 ± 1610.977  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family        1000        128        32768  thrpt   25   4027.581 ±  227.900  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family       50000        128         1024  thrpt   25  71229.107 ± 2720.083  ops/s
PutBenchmarks.putByteArrays                 16        no_column_family       50000        128        32768  thrpt   25   4022.635 ±  212.540  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family        1000        128         1024  thrpt   25  71718.501 ±  787.537  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family        1000        128        32768  thrpt   25   4078.050 ±  176.331  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family       50000        128         1024  thrpt   25  72736.754 ±  828.971  ops/s
PutBenchmarks.putByteBuffers                16        no_column_family       50000        128        32768  thrpt   25   3987.232 ±  205.577  ops/s

## Discussion

The changes don't appear to have had a material effect on performance. We are happy with this.

 * We would obviously advise running future changes before and after to confirm they have no adverse effects.