Tuesday, May 20, 2014

Kudos to RDS's SLA, proving the point of the public cloud

If you go and spin a new RDS server, you'll see this new page added before the wizard:

My perception over the last months is that AWS improved RDS availability, multi-AZ, and they are pushing it more aggressively.

An availability factor of "three and a half nines" (~8hr/year of downtime) is very very good, it usually has a very high price tag attached to it (hardware, software & labor) and usually is a dream for the smaller-medium IT organizations.

Enabling it on a utility low price, 25%-33% higher than the corresponding EC2 machine, RDS makes a real bargain for everyone, making it harder to stay out of public cloud.

Saturday, May 3, 2014

Eventual consistency of NoSQL marketing

Yesterday I learnt an important lesson about an important difference between NoSQL and MySQL, at least when it comes to the marketing and hype.

I saw a tweet from around marketing of one of NoSQL leaders:

Most people apparently would just conclude from the tweet's text, however I actually clicked the link, and couldn't believe eyes:

I guess that in NoSQL, when it comes to the integrity of data as well as hype - it is eventually consistent...



Thursday, May 1, 2014

Explaining the case for MySQL

My faithful readers, please spare 10 mins of your time, and read Baron's excellent post: https://vividcortex.com/blog/2014/04/30/why-mysql

Nuff said.


Since I can't really shut up, and only if you do like my (humble) take on this, I could say in short:

Every technology/platform/framework I choose, will end up surprising me, limiting me for things can be done easily, and throw many painful challenges at me if and when I need to do things that are closer to the platform's "edges". This is true for everything including Rails, JEE, Hibernate, MongoDB, MySQL.

I've learned that the more mature, generically-capable, transparent and ecosystem-rich a solution is - the less painful surprises for me in the worst timings - and more successful I am in my job.

Wednesday, April 9, 2014

Porting from Oracle to MySQL

A potential customer asked my about porting her application from Oracle Database to MySQL.

I always try to start with the "why" (a dear friend bought me this book, recommended: http://www.amazon.com/Start-Why-Leaders-Inspire-Everyone/dp/1591846447).

She said "cloud!". I said "OK!".

I conducted a short research, found many things in many places all over the place, brought them to a nice email I sent her back and then thought I'll post it here and make it public as it might be useful for us all. If you feel that I missed something, add comments, send feedback.

These are the leading tools to do the actual migration of the data structure, data export/import, sprocs, triggers, etc.:
  1. MySQL Workbench has a migration feature: http://www.mysql.com/products/workbench/migrate/
  2. MySQLYog can be used to migrate: http://tkurek.blogspot.com/2013/04/migrate-oracle-to-mysql.html  (already in the conversation in the second comment there)
  3. Navicat can be used to migrate: http://www.navicat.com/products/navicat-for-mysql
  4. Tungsten support Oracle-to-MySQL replication: http://www.continuent.com/downloads/software
  5. Focused data migrators:
    1. http://www.ispirer.com/products/oracle-to-mysql-migration
    2. https://www.youtube.com/watch?v=IW3vKHWJljY
    3. http://www.slideshare.net/Tess98/oracle-to-mysql-migration-presentation
    4. http://www.dbload.com/
    5. http://dbconvert.com/convert-oracle-to-mysql-pro.php
    6. http://www.spectralcore.com/omegasync/


The way I see it, migrating the data is 15% of a database porting project. Efforts are in (partial list):

  1. Porting drivers and driver behavior in the app code
  2. Porting SQL commands all around the app code
    1. Conversion of non-standard SQL flavor
    2. Work-around restrictions and non-supported commands
  3. Ecosystem, monitoring, tuning, tools, scripts, hardware best practices, ops skills, dev skills

Way before the migration of the data on d-day.

A lot of services, some tools. Services-wise I see around:

  1. Pythian: http://www.percona.com/live/mysql-conference-2012/sessions/oracle-mysql-migration
  2. Baron (Percona): http://www.xaprb.com/blog/2009/03/13/50-things-to-know-before-migrating-oracle-to-mysql/

I bet the big SIs (Accenture et al) are strong in this game, as those would be the default go-to service provider for the Oracle shops.


Thursday, March 13, 2014

How Elasticity is dictated by Data Model?

I talk a lot about "Elasticity" and "Data Model", a prospect asked me today "what makes you think they are related?".

Not only are they related, the relation between them holds big part of the substance of ScaleBase, the technology I've been working on for the last 5 years...

Elasticity is the ability to grow or shrink in accordance to the demand.
The cloud makes it very easy to spin more machines, on demand and kill them a day after, pay by the hour, only for real usage. This alone offers fantastic elasticity. Remember that AWS's EC2 stands for "Elastic Compute Cloud".

Volatile/transient/stateless servers are easier to make elastic, AKA application servers, web servers. Just spin another same-image-server behind a round-robin load balancer would solve 80% of the problem.

Data is harder to "elastify".

  1. Data can be replicated across multiple identical servers behind the same round-robin load balancer, but data-replication multiplies data size (bad ROI) and cannot scale writes and updates to the data. 
  2. The only way to scale data is to have it distributed across multiple non-identical servers. 

New challenges:

  1. How would all data consumers (apps, tools) know where the data they look for resides? 
  2. If all for every access they need data from several (or all) the servers, load will end-up multiplied rather than distributed. = no scalability.
  3. OK not all or most, but the minority of accesses do need data from several (or all) the servers. How this data can be found on all quickly and aggregated? 

Challenge 1 is the simplest, just have an index expressing "I want to distribute my data by profile_id" and "put profiles 1-1000 on db1 and 1001-1500 on db2", and then force all data consumers check this index before every data access.

Challenges 2 and 3 are where data model kicks in. For NoSQLs, data model is a document, complete and self-contained, challenges 2 and 3 do not exist.
For SQL databases, a relational data model, takes challenges 2 and 3 to the extreme.

A carefully crafted data distribution policy and the ability to do real-time data aggregation are crucial for a successful scaling relational database.

In our profiles distribution example, identifying that "a profile" is actually a chunk of related data from 100 tables in a complex, multi-level, deep hierarchy - is a hard task to do.
ScaleBase Analysis Genie simplifies the authoring of a data distribution policy that makes sure that related data is stored together on the same server, solving challenge 2.

ScaleBase Controller employs multi-threaded massive parallel execution and advanced result aggregation, supporting all SQL aspects including support for GROUP BY, ORDER BY, HAVING, UNION, JOIN, SUBSELECT to solve challenge 3.


Thursday, November 14, 2013

Will AWS plans for PostgreSQL RDS help it finally pick up?

"Amazon to add Postgres to its most-favored database list" says GigaOM:

http://gigaom.com/2013/11/12/amazon-to-add-postgres-to-its-most-favored-database-list/
"To many this is no-brainer. Amazon wants to support the databases that its developer audiences want to use. This is simply a  case of Amazon responding to user demand and oh-by-the-way making its cloud infrastructure more attractive to a specific target audience. Some say Postgres has gained traction since Oracle’s acquisition of MySQL via its Sun buyout a few years back."

Some people I know said "yea, the writing was on the wall...". Well, was it?? Really? 

AWS finally got the time to "plan" for supporting Postgres now? After supporting MySQL, Oracle and SQL Servers for almost 3 years?! Writing was on the wall? Where can I find a wall this old?

PostgreSQL has not picked up. 

This is why it is a far 4th on Amazon's list. The writer of the text above also makes clear efforts not to pick a side here... "to many this is a no-brainer" or "some say Postgres has gained traction". 

It has been around for ages, thru many "oh! it's now happening!" events, such as the acquisition by of MySQL by Sun, then by Oracle... 

Technically, PostgreSQL's few superior capabilities, especially around schema online modifications (which gets more important these days!), probably could not change its fate, and it's still being held back by too many inferior capabilities, around performance, robustness, ecosystem... 

So - with plans for RDS, will Postgres now pick up? 

Feel free to Share your thoughts... 

Wednesday, April 24, 2013

Concurrency is not parallelism

No so new, but still good piece of reading: http://blog.golang.org/2013/01/concurrency-is-not-parallelism.html
"Concurrency is the composition of independently executing processes, while Parallelism is the simultaneous execution of (possibly related) computations"
As I wrote several times in the past, in OLTP, throughput is king, concurrency is the main thing that is put into the test.

Concurrency is where Facebook has a million "Like"s every second, each "Like" is independent, and they need to be processed concurrently.

Parallelism, is where few concurrent activities, say a few analytic reports run in Oracle Exadata, Vertica or GreenPlum. Every report is is sliced into many related computations that execute simultaneously.

Are these the same?

From 50,000 feet, we see many things running in the same time, in parallel, concurrently, maybe even distributed. But we need to be accurate, there is a huge difference, and it is in the source: how many "original" transactions we had to process? A million "Like"s vs. a few big analytic report. In both cases I see million operations coming out of them at the back, but:
In the "Like"s use case - those are the real transactions, concurrently running, distributed.
In the report use case - those are million pieces of the same initial single job.

Important! Not to be confused! Big difference! One is great for throughput scalability and one is not. More in my next post.