How to use the “jobs” and “clients” parameters in pgbench without going crazy.
pgbench paramaters for concurrency control
pgbench offers two parameters for controlling the concurrency in the benchmark. Namely:
- -j for ‘jobs’. The number of pgbench threads to run.
-j, --jobs=NUM number of threads (default: 1)
- -c for ‘clients’. The number of “postgres” processes to run.
-c, --client=NUM number of concurrent database clients (default: 1)
Here are the TPS delivered for a simple combination of -j(1,10) and -c(1,10) using a very small (cached) database.
The machine is a GCP instance (e2-standard-8 (8 vCPUs, 32 GB memory). The database size is tiny (Scale Factor 100).
pgbench read-only test.
Firstly I ran pgbench with the -S flag “Select Only” to avoid having the disk be a bottleneck. In this experiment we are mainly interested in the concurrency options.
-c=1 | -c=10 | |
-j=1 | 11,385 | 61,267 |
-j=10 | 15,416 | 75,106 |
The result shows that the number of “clients” (postgres client processes) is the clear dominant factor. With the tiny DB and 8 cores a single pgbench thread (-j=1) is almost able to saturate the 8 cores. With j=1 and c=10 there was about 20% idle across all the cores.
With 10 pgbench threads and 10 postgres client processes (-j=10 -c=10) all 8 cores were 100% saturated
pgbench read/write test.
For completeness I re-ran the experiment without the “-S” option. The GCP instance had a single disk and was easily overwhelmed by the amount of IO generated by 8 cores at full blast. At any rate the number of postgres client processes (-c=10) is the clear dominant factor – albeit at a much lower TPS rate (due to the fact that so much time is spent waiting on disk).
-c=1 | -c=10 | |
-j=1 | 683 | 3,111 |
-j=10 | 713 | 3,251 |
What’s really interesting here is that most of the cores are showing “idle” rather than IO wait. I believe that the postgres threads must be waiting on a single writer thread to finish disk IO before they can continue (via lock. or cv_wait. So, in reality all the CPU’s/Threads are blocked on IO, but not directly so the kernel does not know to show that the CPU’s could be doing more work if the IO were faster.