How to reduce database restore time by 50%
During .Next 2018 in London, Nutanix announced performance improvements in the core-datapath said to give up to 2X performance improvements. Here’s a real-world example of that improvement in practice.
I am using X-Ray to simulate a 1TB data restore into an existing database. Specifically the IO sizes are large, an even split of 64K,128K,256K, 1MB and the pattern is 100% random across the entire 1TB dataset.
bssplit=64k/20:128k/20:256k/20:512k/20:1m/20
Normally storage benchmarks using large IO sizes are performed serially, because it’s easier on the storage back-end. That may be realistic for an initial load, but in this case we want to simulate a restore where the pattern is 100% random.
In this case the time to ingest 1TB drops by half when using Nutanix AOS 5.10 with Autonomous Extent Store (AES) enabled Vs the previous traditional extent store.
This improvement is possible because with AES, inserting directly into the extent store is much faster.
For throughput sensitive, random workloads, AES can detect that it will be faster to skip the oplog. Skipping oplog allows AES to eliminate a network round trip to a remote oplog – and instead only make an RF2 copy for the Extent Store. By contrast, when sustained, large random IO is funneled into oplog, the 10Gbit network can become the bottleneck. Even with faster networks, AES will still be a benefit because the CPU and SSD resource usage is also lower. Unfortunately I only have 10Gbit networking in my lab!
The X-Ray files needed to run this test are on github