I've lived around databases all my life, 21st century is challenging for them: big data, throughput, complexity, virtualization, global distribution - it's all scalability.
I'm the founder and CTO of ScaleBase, solving this problem is a workoholic's heaven, so I'm having great time!
My agenda is to stay technical, no marketing and sales BS, give my summarized set of views and opinions to urgent topics, events and latest news in database scalability.
Tuesday, May 15, 2012
Scale differences between OLTP and Analytics
In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I reviewed the differences between OLTP and Analytics databases.
Scale challenges are different between those 2 worlds of databases.
Scale challenges in the Analytics world are with the growing amounts of data. Most solutions have been leveraging those 3 main aspects: Columnar storage, RAM and parallelism.
Columnar storage makes scans and data filtering more precise and focused. After that – it all goes down to the I/O - the faster the I/O is, the faster the query will finish and bring results. Faster disks and also SSD can play good role, but above all: RAM! Specialized Analytics databases (such as Oracle Exadata and Netezza) have TBs of RAM. Then, in order to bring results for queries, data needs to be scanned and filtered, a great fit for parallelism. A big data range is divided into many smaller ranges given to parallel worker threads that each performs his task in parallel, the entire scan will finish in a fraction of the time.
In the OLTP, scale challenges are in the growing transaction concurrency throughput and… growing amounts of data. Again? Didn't we just say growing data is the problem of Analytics? Well, today’s OLTP apps are required to hold more data to provide a larger span online functionality. In the last couple of years OLTP data archiving was changed dramatically. OLTP data now covers years and not just days or weeks. Facebook recently launched its “time line” feature (http://www.facebook.com/about/timeline), can you imagine your timeline ends after 1 week? Facebook’s probably world’s largest OLTP database holds data of a billion users for years back. Today all data is required anywhere anytime, right here, right now, online. Many of today’s OLTP databases go well beyond the 1TB line. And what about transaction concurrency throughput? Applications today are bombarded by millions of users shooting transactions from browsers, smartphones, tablets… I personally checked my bank account 3 times today. Why? Because I can…
What can be done to solve OLTP scale challenges?
In my next post let's start answering this question with understanding why solutions proposed for the Analytics are limited in the OLTP, and start reviewing relevant approaches.
Stay tuned, subscribe, get involved!
Labels:
analytics,
Columnar Storage,
Data warehouse,
Database scalability,
MySQL,
OLTP,
Parallelism,
Scale out
Location:
Newton, MA, USA
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment