Monday, May 21, 2012

Scaling OLTP is nothing like scaling Analytics

We're in the big data business. OLTP applications and Analytics.

Scaling OLTP applications is nothing like scaling Analytics, like I posted here: OLTP is a mixture of read and writes, heavy session concurrency and also growing amounts of data.

In my previous post,, I mentioned that Analytics can be scaled using: columnar storage, RAM and query parallelism.

Columnar storage cannot be used for OLTP, as while it makes read scans better, it hurts writes, especially INSERTs. Same goes for RAM, the approach of “let’s put everything in memory” is also problematic for writes that should be Durable (the D in ACID). There are databases that reach Durability with writing to memory of at least 2 machines, I'll get to that in a later post, but in the simpler view, RAM is great for reads (Analytics), very limited for writes (OLTP).

Query parallelism that worked for Analytics, is limited for OLTP. Mostly because of high concurrency and writes, OLTP is a mixture of read and writes, ratios today reach 50%-50% and more. Every write operation is eventually at least 5 operations for the database, including table, index(s), rollback segment, transaction log, row-level locking and more. Now multiply with 1000 concurrent transactions, and 1TB of data. The database engine itself becomes the bottleneck! It puts so many resources into buffer management, locking, thread locks/semaphores, and recovery tasks, no resources are left available for handling query data!

3 bullets why na├»ve parallelism is not a magic bullet for OLTP:
  1. Parallel query within the same database server will just turn the hard-to-manage 1000 concurrent transactions into impossible 1000000 concurrent sub-transactions… Good luck with that… 
  2. Parallelizing query on several database servers is a step in a good direction. However it can’t scale: if I have 10 servers and each one my 1000 concurrent transactions needs to gather data from all servers in parallel, how many concurrent transactions I’ll have on each server? That’s right, 1000. What did I solve? Can I scale to 2000 concurrent transactions? All my servers will die together. In that case what if I scale to 20 servers instead of 10? Then I’ll have 20 servers with 2000 concurrent transactions… that will all die together. 
  3. OLTP operations are not good candidates for parallelism:
    1. Scans, Full table/index scans and range scans, are parallelized all the time in Analytics, are seldom in OLTP. In OLTP most accesses are short, pinpointed, index-based small range and unique scans. Oracle’s optimizer mode FIRST_ROWS (OLTP) will almost always prefer index access and ALL_ROWS (Analytics) will have hard time give up its favorable full table scan.  So what exactly do we parallelize in OLTP? An index rebuild once a day (scan...)?
    2. 1000 concurrent 1-row INSERT commands a second - is a valid OLTP scenario. What exactly do I parallelize?
Parallelism cannot be the one and only complete solution. It serves a minor role in the solution, whose key factor is: distribution.

OLTP databases can scale only by a smart distribution of data but also the concurrent sessions among numerous database servers. It’s all in the distribution of the data, if data is distributed in a smart way, concurrent sessions will be also distribute across servers.

Go from 1 big fat database server dealing with 1TB of data and 1000 concurrent transactions, to 10 databases, each deal with easy 100GB and 100 concurrent transactions. I'll hit the jackpot if I'll manage to keep databases isolated, shared nothing, processing-wise, not only cables-wise. Best are transactions that start and finish on a single database.

And if I’m lucky and my business is booming, I can scale:
  1. Data grew from 1TB to 1.5TB? Add more databases servers.
  2. Concurrent sessions grew from 1000 to 1500? Add more databases servers.
  3. Parallel query/update? Sure! If a session does need to scan data from all servers, or need to perform an index rebuild, it can run in parallel on all servers, and will take a fraction of the time.
Ask Facebook (FB, as of today... ). Each of their 10,000s databases is handling a fraction of the data, in a way that only a fraction of all sessions are accessing it in any point in time.

Each of the databases is still doing hard work on every update/insert/delete and on buffer management, locking, thread locks/semaphores, and recovery tasks. What can we do? It's OLTP, it's read/write, it's ACID... It's heavy! I trust every one of my DBs to do what it does best, I just give it the optimal data size and session concurrency to do that.

Let's summarize here:

In my next post I'll dive more into implementations caveats (shared disk, shared memory, sharding) and pitfalls, do's and don't's... 

Stay tuned, join those who subscribed and get automatic updates, get involved!

1 comment:

  1. in all these noises here is a factual article. without db federaion even in memory databases can not scale. SSD storage can but federated db , with SSD will be the way forward. other solutions like Cluster cache synch will be problematic and will again be bogged down. Thank you sir. I thoight what u had mentoned.