So, my company recently (within the past couple years) made a major tactical shift from the world of Microsoft-centered development to a very strict adherence to open-source technologies. Driven primarily by our rapid growth and the inability/difficulty for much of our existing MS infrastructure to grow with us. It has served us well thus far but, we have grown beyond the point where it made sense to keep paying the massive licensing fees and developing creative workarounds for scalability. (I have no doubt many would disagree with this rationale. Just my opinion.) This has caused a lot of confusion and consternation throughout the organization.
For the most part, the development and architecture groups embraced this change. Like myself, we have a crew that is interested in the best solution for the job. Regardless of the fact we have been working, almost exclusively, in the MS .NET world for years and this change means a major commitment and investment by all parties. (Those not interested have moved-on without any hard feelings, hopefully, on either side.) However, this is still a fairly steep hill to climb. It’s not that we just need to switch languages from say, C# to java. That is the easy part. There is a whole new ecosystem to adjust (or re-adjust) to. In addition, we are changing our entire system architecture from traditional, datacenter-deployed, applications to cloud-based, globally-distributed applications. This involves all of us and takes significant learning and knowledge sharing across groups as well as new development practices, QA processes and deployment mechanisms and processes.
However, all of this is far beyond the scope of this blog post. Today I just want to begin by kicking off a series regarding one subject that has come-up a lot and is becoming a pain point for teams across the organization. That is that, now that we are cutting ties with our historical datastore (MS SQL Server), what open source database should I choose for my application?
Part of this decision has been helped by the fact that we have had various members of our development and architecture teams do formal reviews of various options. Currently, we have three solutions that are accepted as zero-barrier options. (By this, I just mean that the use of them has been approved and does not require additional justification. Not that there are not barriers such as a learning-curve.)
The current options are as follows:
Great, polyglot persistence. No trouble there. Just use the right store for the job. Right? Well, remember, this is a whole new world for most of us here. (And, arguably, for everybody that might read this post.) We are used to one option, MS SQL Server. We would model our data using RDBMS standards. Normalizing data as best we could while balancing relative performance. Then model our application domain objects to stored procedures that did various joins, subqueries, etc. to get the data in the exact “shape” we needed to work with.
So, now the directive comes along to change your data store with a general preference being the use of the approved NoSQL options. So, the question, predictably, arises “Why do I need to change the way I have always worked?” Well, this is a complicated question to answer. And, really, the answer is you don’t. However, as with every decision you will ever make, you have to be willing to make trade-offs if you want to stay the course. The trade-offs in the new “cloudified” world of sticking with a traditional RDBMS like MySQL vs making the shift to something like Cassandra or MongoDb, are pretty steep. But there are still use cases where it would make sense.
So, in the next few posts I plan on tackling this question as well as some others to the best of my ability. In addition, I would like to lay-out what I feel are good guidelines for choosing among the various types of databases available given a knowledge of the following:
- The type or shape of the data you plan to store or use.
- The volume of the data. (How big is the data? Either individual items or number of items.)
- How you plan to use the data.
Other ideas will likely creep-in as I tend to stray off-topic from time-to-time. But, I will attempt to stay focused and do my best to add value to this overall discussion both for my organization and to those out there that might be going through similar transitions at their work.
I hope this series is informative. Please add responses or ask questions as I go. I have a thick skin and can handle criticism. I tend to learn the most from those that don’t agree with me.