Tuesday, July 10, 2012

So now Hadoop's days are numbered?

Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"
I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...

Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.

My take from the article is this: Hadoop is a foundation, low-level platform. I used the word "platform" just because of a lack of a better word. Wait there is a great word that captures it all! 

This word is Assembler

When computers begun 70 years ago or so, Assembly is the mother of all programming languages, Assembler made it work in real world computers, silicone and copper. In the world of Big Data, map-reduce, massive distribution and parallelism is the mother of all living things (Assembly). And Hadoop enables it to actually run in the real world (Assembler)... 

Like Assembler, Hadoop core is far from being really usable.  Doing something real, good, working, repeatable with it requires skills that only a few people can really master (Like good Assembler programmers, back in 1960's).

While I consider myself lucky to have the chance to actually punch cards with brilliant(?) Assembler code, many of today's brightest minds in Silicone Valleys around the world never wrote one opcode. They're all using PHP, Ruby, Java and node.js, which are great "wrappers" around good old Assembly to bring programming, innovation, disruptiveness - to the masses, make the whole world a better place. It's how it should be.

Hadoop will die only if data and big data dies. Nonsense. Data is by far the most important asset organizations have. Facebook as well as Bank Of America will be worth a fraction of their value in minutes if they loose the same fraction of their data. Both won't be able to compete if they can't be intelligent and analyze their data that multiplies every (low number) days/weeks/months. The data makes a business intelligent and Hadoop helps exactly there. 

Hadoop is the Assembler of all analytical big data processing, ETL and queries. The potential around it and its ecosystem is literally unlimited, tons of innovation and disruptiveness are poured by startups and communities all over, like Splunk, HBase, Cloudera, Hive, Hadapt, and many many more. And we're just in the "FORTRAN" phase...


  1. Your punch card is not showing assembler code but FORTRAN and before you ask, yes it was hard working on computer while the dinosaurs were roaming the land.

    1. You got me!! I would have attached a pic of me holding my authentic punchcards, but back then I just couldn't find my cell-phone with a camera... :)

      Thanks for the interest!

  2. Love the analogy, Doron. When I started in this business, the moldy old guys were bragging about how they used to string wires between boards to make things work. To them, assembler was a mamby-pamby tool for wimps who couldn't handle real computing!

    Since I was an economics major and feeling a bit inferior about my technical background, I decided to read the S/370 Principles of Operation in detail. Best thing I ever did. A low-level understanding of how the machine itself actually works is indispensable in this business. It certainly served me well at Intel, where the machine's internal architecture isn't nearly as straightforward as that of the S/370!

    But it isn't about bragging rights and how much techno-detail one can master, is it? The fundamental barrier we all face, in life and technology, is TIME. It's incompressible, and it's inexorable.

    If we can find ways to productively use up vast amounts of excess raw computational power to save time, it's worth it. That's what PHP and all the rest are all about.

    An observation: all of the computational power in the world amounts to a hill of beans when it comes to bandwidth and latency (at the limit of compressibility). Since we now know that neutrinos actually do follow the laws of general relativity, we're back to the limitations imposed by the speed of light on that score.

    In the end, time still wins.

    1. Mitch - inspiring! Thank you.

      I totally agree and I would add one additional aspect: mass.

      To PHP on a MacBook is much simpler and graspable than to FORTRAN on punchcards (or on wires)! It requires less training so many more can actually be part of "computing" and its innovation, from JCL and batch calculations, all the way to PC, iPad, Amazon, Facebook, Angry Brids!!
      If we take the above to the extreme, we can bring Einstein once again with his famous quote: "Two Things Are Infinite: the Universe and Human Stupidity"... :)

      Thanks for the interest.

    2. Believe it or not, I'd never read that quote before. Fantastic!

  3. This comment has been removed by a blog administrator.