Friday, January 4, 2013

Partial partitioning and sharding

I came across this:
I was wondering if sharding is an alternate name for partial replication or not. What I have figured out that --
  • Partial Repl. – each data item has only copies at some but not all of the nodes (‘Sharding’?)
  • Pure Partial Repl. – has only copies of a subset of the data item but no node contains a full copy of the database
  • Hybrid Partial Repl. – a set of nodes are full replicas and another set of nodes are partial replicas

I thought it was a good topic, I write a really nice answer, but there was a problem when I pressed "post this answer", probably some error or mistake on their side. Anyway - this is what I have to say about partial partitioning and sharding:

Partial replication is an interesting way, in which you distribute the data with replication from a master to slaves, each contains a portion of the data. Eventually you get an array of smaller DBs, read only, each contains a portion of the data. Reads can very well be distributed and parallelized. 

But what about the writes? 

Those are still clogged, in 1 big fat lazy master database, tasks as buffer management, locking, thread locks/semaphores, and recovery tasks - are the real bottleneck of the OLTP, they make writes impossible to scale... As I wrote in many previous posts, for example here:

Sharding is where every piece of data lives only in one place, within an array of DBs. Each database is the complete owner of the data: data is read from there, data is written to there. This way, reads and writes are distributed and parallelized, real scale-out can be achieved. 

The idea behind sharding is great, the ultimate scaling solution (ask Facebook, Google, Twitter and all the other big guys) but it's a mess to handle, to maintain. It's hard as hell if done by yourself, ScaleBase enables an automatic transparent scale-out machine - that does all that, so you won't have too...