Wednesday, April 18, 2012

So how can we scale databases?

There are ways to scale databases, unfortunately some are limited, some introduce complexities, some are do not fit the cloud...

By scaling solution I mean a solutions that help me scale my existing environment, my existing RDBMS. Some magic or technology that will take my existing Oracle or MySQL for example, to the next level, without porting to a new DB engine/vendor and without completely recoding my app.

Let's try to organize things a bit in this very summarized table, just to get the hunch of it. I can't imagine to cover it all in 1 table or even 100 pages, but that should be a start of a meaningful discussion to continue in next posts:

Solution
Scales reads?
Scales writes?
Scales data?
Scales sessions?
Cloud?
Bottom line
Scale-Up: faster HW, CPU, memory, disks, SSD
Y
Y
N
N
N
Costy, limited
Shared disk cluster: Oracle RAC and similar
Y
Y--
N
N
N
Costy, hard to implement, might damage non-read-mostly apps
Replication based
Read/Write splitting
Y
N
N
N
Y
A valid solution, easy to implement, limited
Multi master replication
Y
Y--
N
Y--
Y
Strict data ownership is a MUST to enjoy any advantage
Scale Out (Sharding?)
Y
Y
Y
Y
Y
A valid solution, might introduce major complexities...


I think it'll be safe to say: there are solutions, there are supporting technologies, however database tech alone is not enough. To really get the benefit from a shared-disk cluster we need to really know what we're doing, not to be overthrown by disk latency or inter-db-machines network noise traffic. Multi-master replication is a recipe for disaster, conflicts, split-brain, loss of data - without proper data ownership definition and enforcement. Shared-disk-cluster and multi-master replication (3 versions of it) exist in Oracle for over 10 years. And still, they can't solve all RDBMS scaling limits because those technologies alone are like double edge swords, should be handled gently by experienced craftsmen, and with good integration with additional concepts and tools.

No, having the long time waited multi-master replication available for MySQL, will not solve scale issues for MySQL, it will not bring piece to the force of geo-clustering etc. Without proper data ownership, proper design, it'll just introduce all the same all flaws Oracle DBAs have been dealing with for the last decade...

And for scale-out, it's absolutely great, but it's very hard to do it with data and databases.
Sharding - it's like the database outsourced scaling to the application. Application developers should concentrate in the application logic, strive to making it better, make the business competitive with new features. Every time an app developer spends time on database specific matters is a poor case of efforts waste and a skill mismatch.

It shouldn't be a surprise, that in recent years I was seeking a solution to scale out over to standard databases building blocks. If you want - a solution to that will bring that obvious advantages of sharding, but without the pains of doing it in the application tier... I gather we have been quite successful in introducing that in the MySQL database.

In future posts I'll drill down and elaborate on rows in the table above, feel free to add and comment, I'll address any comment!!

The light in the end of the tunnel is that the basic building blocks for solving database scalability are there! There're still not a safe, well packaged, polished solution like an iPhone4 that can be easily used by my 1 year old youngest daughter. Those building blocks still need to be put well together into a solution by experienced professionals, or with tools and design, 3rd party products and lots of thinking... It's not a simple task...

No comments:

Post a Comment

Post a Comment