1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
|
# Tessera Storage Performance
Tessera is designed to scale to meet the needs of most currently envisioned workloads in a cost-effective manner.
All storage backends have been tested to meet the write-throughput of CT-scale loads without issue.
The read API of Tessera based logs scales extremely well due to the immutable resource based approach used, which allows for:
1. Aggressive caching to be applied, e.g. via CDN
2. Horizontal scaling of read infrastructure (e.g. object storage)[^1]
[^1]: The MySQL storage backend is different to the others in that reads must be served via the personality rather than directly,
however, due to changes in how MySQL is used compared to Trillian v1, read performance should be far better, and _could_ still
be scaled horizontally with additional MySQL read replicas & read-only personality instances.
Below are some indicative figures which show the rough scale of performance we've seen from deploying Tessera conformance
binaries in various environments.
## Performance factors
### Resources
Exact performance numbers are highly dependent on the exact infrastructure being used (e.g. storage type & locality, host resources
of the machine(s) running the personality binary, network speed and weather, etc.) If in doubt, you should run your own performance
tests on infrastructure which is as close as possible to that which will ultimately be used to run the log in production.
The [conformance binaries](/cmd/conformance) and [hammer tool](/internal/hammer) are designed for this kind of performance testing.
### Antispam
Antispam is a feature which does best effort deduplication of incoming entries. While cheaper than _strong atomic_ deduplication would
be, it is still a somewhat expensive operation in terms of both storage and throughput.
Not all personality designs will require it, so Tessera is built such that you only incur these costs if they are necessary
for your design.
Leaving antispam disabled will greatly increase the throughput of the log, and decrease CPU and storage costs.
## Backends
The currently supported storage backends are listed below, with a rough idea of the expected performance figures.
Individual storage implementations may have more detailed information about performance in their respective directories.
### GCP
The main lever for cost vs performance on GCP is Spanner, in the form of "Performance Units" (PUs).
PUs can be allocated in blocks of 100, and 1000 PUs is equivalent to 1 Spanner Server.
The table below shows some rough numbers of measured performance:
| Spanner PUs | Num FEs | QPS no-antispam | QPS antispam |
|-------------|---------|--------------|-----------|
| 100 | 1 | > 3,000 | > 800 |
| 200 | 1 | not done | > 1500 |
| 300 | 1 | not done | > 3000 |
| 300 | 2 | not done | > 5000 |
### POSIX
Performance of the POSIX storage backend is highly dependent on the underlying infrastructure, some representative examples
of the performance on different types of infratructure are given below.
#### Local storage
##### NVMe
The log and hammer were both run in the same VM, with the log using a ZFS subvolume from the NVMe mirror.
With antispam enabled, it was able to sustain around 10,000 write qps, using up to 7 cores for the server.
```
┌───────────────────────────────────────────────────────────────────────────┐
│Read (8 workers): Current max: 20/s. Oversupply in last second: 0 │
│Write (30000 workers): Current max: 10000/s. Oversupply in last second: 0 │
│TreeSize: 5042936 (Δ 10567qps over 30s) │
│Time-in-queue: 1889ms/2990ms/3514ms (min/avg/max) │
│Observed-time-to-integrate: 2255ms/3103ms/3607ms (min/avg/max) │
├───────────────────────────────────────────────────────────────────────────┤
```
##### SAS 12Gb HDD
A single local instance on an 12-core VM with 8GB of RAM writing to local filesystem stored on a mirrored pair of SAS disks.
Without antispam, it was able to sustain around 2900 writes/s.
```
┌────────────────────────────────────────────────────────────────────────────────────┐
│Read (8 workers): Current max: 20/s. Oversupply in last second: 0 │
│Write (3000 workers): Current max: 3000/s. Oversupply in last second: 0 │
│TreeSize: 1470460 (Δ 2927qps over 30s) │
│Time-in-queue: 136ms/1110ms/1356ms (min/avg/max) │
│Observed-time-to-integrate: 583ms/6019ms/6594ms (min/avg/max) │
├────────────────────────────────────────────────────────────────────────────────────┤
```
With antispam enabled (badger), it was able to sustain around 1600 writes/s.
```
┌────────────────────────────────────────────────────────────────────────────────────┐
│Read (8 workers): Current max: 20/s. Oversupply in last second: 0 │
│Write (1800 workers): Current max: 1800/s. Oversupply in last second: 0 │
│TreeSize: 2041087 (Δ 1664qps over 30s) │
│Time-in-queue: 0ms/112ms/448ms (min/avg/max) │
│Observed-time-to-integrate: 593ms/3232ms/5754ms (min/avg/max) │
├────────────────────────────────────────────────────────────────────────────────────┤
```
#### Network storage
A 4 node CephFS cluster (1 admin, 3x storage nodes) running on E2 nodes sustained > 1000qps of writes.
#### GCP Free Tier VM Instance
A small `e2-micro` free-tier VM is able to sustain > 1500 writes/s using a mounted PersistentDisk to store the log.
> [!NOTE]
> Virtual CPUs (vCPUs) in virtualized environments often share physical CPU cores with other vCPUs and introduce variability
> and potential performance impacts.
```
┌───────────────────────────────────────────────────────────────────────┐
│Read (184 workers): Current max: 0/s. Oversupply in last second: 0 │
│Write (600 workers): Current max: 1758/s. Oversupply in last second: 0 │
│TreeSize: 1882477 (Δ 1587qps over 30s) │
│Time-in-queue: 149ms/371ms/692ms (min/avg/max) │
│Observed-time-to-integrate: 569ms/1191ms/1878ms (min/avg/max) │
└───────────────────────────────────────────────────────────────────────┘
```
More details on Tessera POSIX performance can be found [here](/storage/posix/PERFORMANCE.md).
## MySQL
Figures below were measured using VMs on GCP in order to provide an idea of size of machine required to
achieve the results.
> [!NOTE]
> Note that for Tessera on GCP deployments, we **strongly recommended* using the Tessera GCP storage implementation instead.
### GCP free-tier + CloudSQL
Tessera running on an `e2-micro` free tier VM instance on GCP, using CloudSQL for storage can sustain around 2000 write/s.
```
┌───────────────────────────────────────────────────────────────────────┐
│Read (8 workers): Current max: 0/s. Oversupply in last second: 0 │
│Write (512 workers): Current max: 2571/s. Oversupply in last second: 0 │
│TreeSize: 2530480 (Δ 2047qps over 30s) │
│Time-in-queue: 41ms/120ms/288ms (min/avg/max) │
│Observed-time-to-integrate: 568ms/636ms/782ms (min/avg/max) │
└───────────────────────────────────────────────────────────────────────┘
```
### GCP free-tier VM only
Tessera + MySQL both running on an `e2-micro` free tier VM instance on GCP can sustain around 300 writes/s.
```
┌──────────────────────────────────────────────────────────────────────┐
│Read (8 workers): Current max: 0/s. Oversupply in last second: 0 │
│Write (256 workers): Current max: 409/s. Oversupply in last second: 0 │
│TreeSize: 240921 (Δ 307qps over 30s) │
│Time-in-queue: 86ms/566ms/2172ms (min/avg/max) │
│Observed-time-to-integrate: 516ms/1056ms/2531ms (min/avg/max) │
└──────────────────────────────────────────────────────────────────────┘
```
More details on Tessera MySQL performance can be found [here](/storage/mysql/PERFORMANCE.md).
## AWS
Coming soon.
|