1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263
|
# RQRCode Core - Performance Benchmarks
This document describes the benchmarking infrastructure for rqrcode_core and provides baseline performance metrics.
## Running Benchmarks
### Quick Start
```bash
# Quick comparison (10 iterations, ~5 seconds)
rake benchmark
# or
rake benchmark:simple
# Detailed performance analysis with benchmark-ips (~30 seconds)
rake benchmark:performance
# Memory profiling with detailed allocation tracking (~30 seconds)
rake benchmark:memory
# Run all benchmarks
rake benchmark:all
```
### Individual Benchmark Files
You can also run benchmark files directly:
```bash
ruby test/benchmark_simple.rb
ruby test/benchmark_performance.rb
ruby test/benchmark_memory.rb
```
## System Information
Baselines collected on:
- **Date**: December 4, 2025
- **Ruby Version**: 3.3.4
- **Platform**: arm64-darwin24 (Apple Silicon)
- **ARCH_BITS**: 64
## Benchmark Types
### 1. Simple Benchmark (`benchmark_simple.rb`)
Fast comparison across common scenarios using Ruby's standard `Benchmark` module. Good for quick before/after comparisons during development.
**Iterations**: 10 per test (configurable via `ITERATIONS` constant)
**Runtime**: ~5 seconds
### 2. Performance Benchmark (`benchmark_performance.rb`)
Detailed performance analysis using `benchmark-ips` gem. Provides iterations-per-second metrics with statistical analysis and comparisons.
**Configuration**: 2 seconds measurement, 1 second warmup (configurable)
**Runtime**: ~30-45 seconds
**Scenarios tested**:
- Data sizes (small/medium/large)
- Encoding modes (numeric/alphanumeric/byte)
- QR versions (1, 5, 10, 20, 40)
- Error correction levels (:l, :m, :q, :h)
- Creation vs rendering
- Multi-segment encoding
### 3. Memory Benchmark (`benchmark_memory.rb`)
Memory allocation profiling using `memory_profiler` gem. Tracks total allocated/retained memory and object allocations by class.
**Runtime**: ~30 seconds
**Scenarios tested**:
- Single QR codes (various sizes)
- Batch generation (100 small, 10 large)
- Creation vs rendering
- Different encoding modes
- Multi-segment encoding
## Baseline Performance Metrics
### Performance (iterations per second)
| Scenario | ips | ms/iteration | vs Baseline |
|----------|-----|--------------|-------------|
| Small QR (v1) | 144.1 | 6.94 | 1.00x |
| Medium URL (v5) | 44.6 | 22.41 | 3.23x slower |
| Large (v24) | 4.6 | 217.37 | 31.33x slower |
| Numeric mode | 147.1 | 6.80 | fastest |
| Alphanumeric mode | 100.9 | 9.91 | 1.46x slower |
| Byte mode | 100.9 | 9.91 | 1.46x slower |
| Version 1 | 147.3 | 6.79 | 1.00x |
| Version 5 | 44.8 | 22.33 | 3.29x slower |
| Version 10 | 18.8 | 53.08 | 7.82x slower |
| Version 20 | 6.4 | 155.21 | 22.86x slower |
| Version 40 | 1.9 | 521.09 | 76.74x slower |
**Key Findings**:
- **Version impact**: Performance degrades quadratically with version (module_count = version*4 + 17)
- **Encoding modes**: Numeric mode is ~46% faster than alphanumeric/byte modes
- **Error correction**: Minimal impact (~2% variance) across levels :l, :m, :q, :h
- **Rendering**: Adds ~3% overhead to creation time
### Memory Usage
| Scenario | Allocated | Retained | Objects | Notes |
|----------|-----------|----------|---------|-------|
| Single v1 | 0.38 MB | 0.00 MB | 8,740 | Baseline |
| Single v5 | 0.97 MB | 0.00 MB | 21,264 | 2.5x v1 |
| Single v24 | 8.53 MB | 0.00 MB | 179,659 | 22x v1 |
| 100x v1 | 37.91 MB | 0.00 MB | 872,700 | ~380KB each |
| 10x v24 | 85.32 MB | 0.00 MB | 1,796,590 | ~8.5MB each |
| Create only | 37.91 MB | 0.00 MB | 872,700 | 100 iterations |
| Create + render | 40.27 MB | 0.00 MB | 919,300 | +6% for rendering |
**Key Findings**:
- **No memory retention**: All memory is garbage collectable (0 retained)
- **Top allocations**: Integer (70-76%), Array (15-22%), Range (8-10%)
- **Version scaling**: Memory usage grows quadratically with version
- **Rendering overhead**: Adds ~6% to memory allocation
- **Encoding modes**: Minimal difference (~3% variance) across modes
### ARCH_BITS Impact
To test the memory impact of `ARCH_BITS` setting:
```bash
# Default 64-bit (current baseline)
ruby test/benchmark_memory.rb
# Force 32-bit mode (reduced memory)
RQRCODE_CORE_ARCH_BITS=32 ruby test/benchmark_memory.rb
```
**Expected**: 32-bit mode should reduce memory usage during right-shift operations, particularly noticeable with large QR codes and batch generation.
## Understanding the Results
### benchmark-ips Output
```
Calculating -------------------------------------
Small (v1) 144.135 (± 0.7%) i/s (6.94 ms/i)
```
- **144.135 i/s**: 144 iterations per second
- **(± 0.7%)**: Statistical error margin (lower is more consistent)
- **(6.94 ms/i)**: Milliseconds per iteration
- **Comparison section**: Shows relative performance differences
### memory_profiler Output
```
Total allocated: 0.38 MB # Memory allocated during execution
Total retained: 0.00 MB # Memory still held after GC
Objects allocated: 8740 # Number of objects created
Objects retained: 0 # Number of objects not GC'd
```
- **Allocated**: All memory used (includes garbage)
- **Retained**: Memory still referenced (memory leaks if high)
- **By class**: Shows which Ruby types are most allocated
## Performance Characteristics
### Version Size Impact
QR Code module count formula: `module_count = version * 4 + 17`
| Version | Modules | Total Cells | Performance Impact |
|---------|---------|-------------|-------------------|
| 1 | 21x21 | 441 | 1.00x (baseline) |
| 5 | 37x37 | 1,369 | ~3.1x slower |
| 10 | 57x57 | 3,249 | ~7.4x slower |
| 20 | 97x97 | 9,409 | ~21x slower |
| 40 | 177x177 | 31,329 | ~71x slower |
Performance degradation is roughly O(n²) where n is version number.
### Encoding Mode Efficiency
1. **Numeric** (fastest): 3.33 bits per digit
2. **Alphanumeric**: 5.5 bits per character
3. **Byte** (slowest): 8 bits per character
For mixed content, multi-segment encoding can be more efficient than byte mode.
### Error Correction Impact
Error correction levels have minimal performance impact (<3% variance):
- `:l` - 7% restoration
- `:m` - 15% restoration
- `:q` - 25% restoration
- `:h` - 30% restoration (default)
The performance cost is in capacity (less data fits), not speed.
## Known Considerations
### Memory on 64-bit Systems
From `lib/rqrcode_core/qrcode/qr_util.rb`:
> 64 consumes a LOT more memory. In tests it's shown changing it to 32
> on 64 bit systems greatly reduces the memory footprint.
This occurs during right-shift zero-fill operations. Use `RQRCODE_CORE_ARCH_BITS=32` to reduce memory at potential compatibility risk.
### Large QR Codes
Version 24+ QR codes are significantly slower (~200ms+ per code). For batch processing:
- Consider caching generated codes
- Use background jobs for generation
- Consider lower error correction levels if acceptable
## Future Optimization Ideas
Potential areas for performance improvement:
1. **Memory Optimization**
- Investigate ARCH_BITS impact more thoroughly
- Reduce temporary array allocations
- Optimize string concatenation in rendering
- Use more efficient data structures for modules
2. **Speed Optimization**
- Profile hot paths with stackprof
- Cache frequently computed values
- Optimize inner loops in encoding
- Consider memoization for version calculations
3. **Algorithm Improvements**
- Review polynomial math operations
- Optimize mask pattern calculation
- Benchmark alternative implementations
4. **Benchmarking Infrastructure**
- Add CI performance regression tests
- Track performance trends over time
- Add comparison with other QR libraries
- Create performance dashboard
## Contributing
When making performance-related changes:
1. Run benchmarks before changes: `rake benchmark:all > before.txt`
2. Make your changes
3. Run benchmarks after: `rake benchmark:all > after.txt`
4. Compare results and document improvements
5. Update this file with new baseline if significant
## References
- [benchmark-ips gem](https://github.com/evanphx/benchmark-ips)
- [memory_profiler gem](https://github.com/SamSaffron/memory_profiler)
- [Ruby Benchmark module](https://ruby-doc.org/stdlib/libdoc/benchmark/rdoc/Benchmark.html)
- [QR Code specification](https://www.qrcode.com/en/about/standards.html)
---
**Last Updated**: December 4, 2025
**Baseline Version**: rqrcode_core 2.0.1
|