The Database Scalability Blog - Doron Levari: NoSQL

Showing posts with label NoSQL. Show all posts

Friday, September 12, 2014

Differences between NoSQL databases

Just sharing an answer I gave today to a question in Quora: http://www.quora.com/Whats-the-difference-between-the-different-NoSQL-databases

I think the question is relevant, I think other answers were very relevant and this was my humble addition to the thread:

I think answers above very very good, in my POV, the right NoSQL database for you is the one best fit your requirements in:

Data representation: as said above, key-value, document, graph, etc.
Data usage pattern: OLTP (high concurrency throughput, many queries and updates) vs. Analytics? (low concurrency, few big queries, no updates)
Data availability and consistency: this is the main topic I wish to add

While all relational databases provide the virtues of ACID to keep transactions and data Atomicity, Consistency, Isolation, Durability - few NoSQLs provide full ACID, most do not provide full ACID but rather provide interesting tradeoffs around CAP theorem (http://en.wikipedia.org/wiki/CAP_theorem). Since you can't have all 3, different databases give different combinations, for example 2 NoSQLs from Apache, HBase provides CP and Cassandra provides AP (http://wiki.apache.org/cassandra/ArchitectureOverview).

Hope that helped.

Saturday, May 3, 2014

Eventual consistency of NoSQL marketing

Yesterday I learnt an important lesson about an important difference between NoSQL and MySQL, at least when it comes to the marketing and hype.

I saw a tweet from around marketing of one of NoSQL leaders:

Most people apparently would just conclude from the tweet's text, however I actually clicked the link, and couldn't believe eyes:

I guess that in NoSQL, when it comes to the integrity of data as well as hype - it is eventually consistent...

Thursday, May 1, 2014

Explaining the case for MySQL

My faithful readers, please spare 10 mins of your time, and read Baron's excellent post: https://vividcortex.com/blog/2014/04/30/why-mysql

Nuff said.

Since I can't really shut up, and only if you do like my (humble) take on this, I could say in short:

Every technology/platform/framework I choose, will end up surprising me, limiting me for things can be done easily, and throw many painful challenges at me if and when I need to do things that are closer to the platform's "edges". This is true for everything including Rails, JEE, Hibernate, MongoDB, MySQL.

I've learned that the more mature, generically-capable, transparent and ecosystem-rich a solution is - the less painful surprises for me in the worst timings - and more successful I am in my job.

Tuesday, March 26, 2013

They say: "Relational Databases Aren't Dead"

This is a good read, claiming: "Relational Databases Aren't Dead. Heck, They're Not Even Sleeping", http://readwrite.com/2013/03/26/relational-databases-far-from-dead. A key quote:

"While not comprehensive, the uses for NoSQL databases center around the acquisition of fast-growing data or data that does not easily fit within uniform structures."

There were 2 parts in the statement about NoSQL's uses. I'll start with the latter:

"data that does not easily fit within uniform structures" - NoSQL is probably the right choice, hmm although I always encourage thinking and architecting in advance. And also online structure changes do exist in the RDBMS world and recently in MySQL: http://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl.html...
I would definitely warn about the caveats of NoSQL when it comes to actually use and query the data that is so easily stored there...

"acquisition of fast-growing data" - is no longer a no-go for RDBMS and MySQL database. Distributed RDBMS solutions do exist today and they can exploit performance and scalability from the good old MySQL itself

What do you think?

Wednesday, June 20, 2012

The catch-22 of read/write splitting

In my previous post I covered the shard-disk paradigm's pros and cons, but the conclusion that is that it cannot really qualify as a scale-out solution, when it comes to massive OLTP, big-data, big-sessions-count and mixture of reads and writes.

Read/Write splitting is achieved when numerous replicated database servers are used for reads. This way the system can scale to cope with increase in concurrent load. This solution qualifies as a scale-out solution as it allow expansion beyond the boundaries of one DB, DB machines are shared-nothing, can be added as a slave to the replication "group" when required.

And, as a fact, read/write splitting is very popular and widely used by lots of high-traffic applications such as popular web sites, blogs, mobile apps, online games and social applications.

However, today's extreme challenges of big-data, increased load and advance requirements expose vulnerabilities and flaws in this solution. Let's summarize them here:

All writes go to the master node = bottleneck: While reading sessions are distributed across several database servers (replication slaves), writing sessions are all going to the same primary/master server, hence still a bottleneck, all of them will consume all resources from the DB for our well-known "buffer management, locking, thread locks/semaphores, and recovery tasks"
Scaled sessions' load, not big data: While I can take my, X reading sessions and spread them over my 5 replication slaves giving each to handle with only X/5 sessions, however my giant DB will have to be replicated as a whole to all servers. Prepare lots of disks...
Scale? Yes. Query performance? No: Queries on each read-replica need to cope with the entire data of the database. No parallelism, to smaller data sets to handle
Replication lag: Async replication will always introduce lag. Be prepared for a lag between the reads and the writes.
Reads after write will show missing data. The transaction is not yet committed so it's not written to the log, not propagated to salve machine, not applied at the slave DB.

Above all, databases suffer from writes made by many concurrent sessions. Database engine themselves become bottleneck because of their *buffer management, locking, thread locks/semaphores, and recovery tasks*. Reads are a secondary target. BTW - reads performance and scale can be very well gained by good smart caching, use of a NoSQL such as Memcached in the app, in front of the RDBMS. In modern applications we see more and more avoided reads and writes, that cannot be avoided or cached, storming the DB.

R/W splitting is usually implemented today inside the application code, the it's easy to start, then becomes hard... I recommend using a specialized COTS product that does it 100 times better and may eliminate some or all limitations above (ScaleBase is one solution that gives that (among other things)).

This is read/write splitting's catch 22. It's an OK scale-out solution and relatively easy to implement, but improvement of caching systems, changing requirements in the online applications and big-data and big-concurrency - rapidly driving it towards its fate, become less and less relevant, and only play a partial role in a complete scale-out plan.

In a complete scale-out solution, where data is distributed (not replicated) throughout a grid of shared-nothing databases, read/write splitting will play its part, but only a minor one. Will get to that in next posts.

The Database Scalability Blog - Doron Levari