Friday, September 28, 2012

Being successful like Pinterest without its DB adventures...

I just came across this: "Scaling Pinterest and adventures in database sharding"  (
"Pinterest has learned about scaling the way most popular sites do — the architecture works until one day it doesn’t"
Pinterest found out that "the architecture" is not scalable and they turned to development of a Scale Out mechanism also called Sharding.

I find it amazing that sharding, or in other words, the idea of "scale out by splitting and parallelizing data across shared-nothing commodity-hardware" is not supplied "out of the box" by "the architecture" (such as database, load-balancer, any other IT stuff). I'm wondering who was the one that decided that an IT issue like scale-out should be outsourced from the database to the application developers?...


When was the last time you heard about a PHP or Ruby developer wrote code to enable Scale Out. NEVER! Scale Out in the application layer is enabled easily by a magical box called a load balancer, and you can get one from F5 or wherever for a low 4 digit USD. Commodity! 

But to scale the database? To enjoy the obvious advantages of "scale out by splitting and parallelizing data across shared-nothing commodity-hardware"? - for this the world still thinks developers need to stop investing effort in innovation, better product, competitive business. Instead they need harness their how-databases-really-work skills to write band-aid code to scale the DB. 

Amazing... As you know I took it personally, and have been solving this paradox every day now, by bringing a complete, automatic, out-of-the-box "scale-out machine", that we like to call ScaleBase. I think Pinterest story is great, with a great outcome, but it's not always the case with this complex matter, and a generic, repeatable, IT-level solution for Scale Out can make it much easier for all other "Pinterests" out there to be as successful and make the right choice and enjoy the great benefits - without the tremendous efforts and labor in home-grown sharding.


  1. A aggregation like "group by" in the SQL would cut across all shards, right? how would a load balancer help in this case? I am really interested to find out if there is any means to do sharding without modifying the application code considering an application performs "group by" transaction. Thanks.

    1. You hit it right on the spot! Maybe the main challenge in sharding is to aggregate results with GROUP, JOIN, ORDER, subqueries, aggregate functions and such. If you can't have these - you have database silos which is bad... And it's so hard to implement so people usually don't.
      There are commercial solutions that do that, one is ScaleBase of which I'm founder and CTO, and exactly today I'm conducting a webinar EXACTLY on this subject you asked, it's not too late to register, I'm sure you'll get much more info there.