Yesterday I was asked by a customer for the reason why he had failed to achieve scale with a state-of-the-art "shared-storage" cluster. "It's a scale-out to 4 servers, but with a shared disk. And I got, after tons of work and efforts, 130% throughput, not even close to the expected 400%" he said.
Well, scale-out cannot be achieved with a shared storage and the word "
shared" is the key. Scale-out is done with absolutely
nothing shared or a "shared-nothing" architecture. This what makes it linear and unlimited. Any shared resource, creates a tremendous burden on each and every database server in the cluster.
In a
previous post, I identified database engine activities such as buffer management, locking, thread locks/semaphores, and recovery tasks - as the main bottleneck in the OLTP database, handling reads and writes mixture. No matter which database engine, they all have all of the above, Oracle, MySQL, all of them. "
The database engine itself becomes the bottleneck!" I wrote.
With a shared disk - there is a single shared copy of my big data on the shared disk, the database engine still have to maintain "buffer management, locking, thread locks/semaphores, and recovery tasks". But now - with a twist! Now all of the above need to be done "globally" between all participating servers, thru network adapters and cables, introducing latency. Every database server in the cluster needs to update all other nodes for every "buffer management, locking, thread locks/semaphores, and recovery tasks" it is doing on a block of data. See here number of "conversation paths" between the 4 nodes:
Blue dashed lines are data access, red lines are communications between nodes. Every node must notified and be notified to and by all other nodes. It's a complete graph, with 4 nodes and there are 6 edges. With 10 you'll find 45 red lines here (
n(n - 1)/2, a
reminder from Computer Science courses...). Imagine the noise, the latency,
for every "buffer management, locking, thread locks/semaphores, and recovery tasks". Shared-storage becomes shared-everything. Node A makes an update to block X - 10 machines need to acknowledge. I wanted to scale, but instead I multiplied the initial problem.
And I didn't even mention the fact that the shared disk might become a SPOF and a bottleneck.
And I didn't even mention the limitations when you wanna go with this to cloud or virtualization.
And I didn't mention the tons of money this toy costs. I prefer buying 2 of
these, one for me and one for a good friend, and hit the road together...
Shared-disk solutions gives a very limited solution to
OLTP scale. If the data is not distributed, computing resources required to handle this data will not be distributed and thus reduced, on the contrary, they will be multiplied and consume all available resources from all machines.
Real scale-out is achieved by distributing the data in shared nothing, every database node is independent with its data, no duplications, no notifications, ownership or acknowledges over any network. If data is distributed correctly, concurrent sessions will be also distribute across servers, each node runs extremely fast on its small-data, small load, with its own small "buffer management, locking, thread locks/semaphores, and recovery tasks".
My customer responded "Makes perfect sense! Tell me more about Scale-Out and distribution...".