The Cloud Game Is For Grown-Ups

tantrumListening to a recent podcast on Software Engineering Radio featuring Adrian Cockroft speaking about the modern cloud-based platform, I was left bursting with thoughts. He covered a lot of topics which I hope to cover in the near future. However, one in particular hit me like a ton of bricks. He mentioned that he has seen many retailers and manufacturers of goods choosing to build their own private clouds rather than trust Amazon, someone they see as a rival, to provide their cloud infrastructure. To this, Adrian had many great points. I’d like to outline some of them as best that I can and add some of my own thoughts. (All of these are in my own words inspired by Adrian’s thoughts.) I’d highly recommend you ceck-out the podcast yourself. I’ll provide a link at the end.

Don’t Be A Child

By this I mean, by seeing Amazon’s online retail business as a competitor and not giving them your cloud-hosting business is analogous to a child saying they won’t let a friend help them with your math homework because they beat them at ball earlier. The only person you will be hurting is yourself.

Amazon has already figured-out all of the ins-and-outs of building, maintaining and scaling a cloud infrastructure. For you to do the same, without a really, really good reason (I acknowledge that there are cases that warrant private clouds. But, I find them to be few and far-between.) is a fool’s errand. Even with the most skilled, most experienced (read: highly paid) staff, you will have many months if not years of time lost getting your private cloud to the state of reliability, scalability and usability of Amazon’s.

Leverage Your Competitor’s Strength

By choosing Amazon to host your cloud infrastructure, you are starting two steps ahead. You don’t have to worry about building and maintaining this infrastructure. While, to do it properly, you will need to build or buy some additional automation and monitoring. But the vast bulk of the complications of building and maintaining a cloud infrastructure will be someone else’s problem. this will free you to focus on your core competencies. If it happens to be software, it allows your teams to focus on building new features, rapidly. And this, ultimately, is the greatest competitive advantage to the cloud.

By embracing Amazon’s strength as the best (I know, arguably) cloud platform provider, you are turning their strength against them. The fact that you are free to concentrate on making better products, whatever they may be, you are allowing your competitor to help you. Bringing back the childhood analogy it would be like letting your friend that is the superior basketball player teach you to be better at math. Then you take the time saved on your math homework to get better at basketball. (Ok, a stretch of an analogy but, I hope you follow me.)

In short, get your head straight

Seriously, if you want to compete in today’s ultra-fast-moving world you need to make smart decisions. You need to take every single advantage you can get. Allowing somebody else to take-off such a tremendous burden as providing your cloud infrastructure is a gift. You can’t let base such decisions on emotions. As Adrian Cockroft poited-out, Netflix still uses Amazon to host it’s entire streaming infrastructure despite the fact that Amazon is directly competing with them for the same business through it’s Prime video offerings. And beating them handily in the process.

I am however, not naive to the fact that there are still many good reasons to build a maintain a private cloud. Ultimately, there are some applications that likely wouldn’t benefit from a cloud infrastructure at all. But, you should think long and hard about your reasoning before deciding to build your own cloud. Think inwardly to make sure there are not personal feelings influencing your decision-making. As I mentioned, there are many reasons for building your own cloud (storing critically sensitive data, localization laws, etc.) but, feeding the competitor is not a good one.

Pertinent Links

Adrian Cockcroft on the Modern Cloud-based Platform

Adrian Cockroft’s Blog

Software Engineering Radio

Advertisement

Push the boundaries! Responsibly.

Recently, I had a colleague of mine say that they had taken “I took a lower overall pay scale to come to <my company> simply because I wanted to be part of something that matters.” This was very refreshing to hear as my company is going through a phase of great attrition due to many factors, not the least of which is a tight-fisted lock on what can be developed and deployed. Many people have become disillusioned with executive leadership decisions because they appear to be blocking progress to delivering real value to ensure stability.

This has been sad to see as I joined my current company for the exact same reason and with the exact same sacrifice in compensation in order to make change that I believe this company is fantastically positioned to affect. Many of the individuals that have been the most influential in making real, substantive, progress toward our end-goal, have left for other opportunities over the past 6-12 months as a result. Most, to other companies in the same or related industries that they believe have a clear, concise vision and the will and motivation to take risks in order to push forward.

Where Do I Stand?

My feeling, while it may seem like a cheat, is somewhere on the fence. I do believe you must have stability first in order to make clients confident in your product. Confidence drives adoption and adoption can drive the change that you are looking to create. However, in order to succeed in making change, you must stretch the boundaries of the known and take risks. This is the only true path to innovation. It’s impossible to innovate while staying in your comfort zone. In my experience, true innovation happens when you have one foot in the known and the other in the unknown.

Stability and Innovation Do Not Need to Be Mutually Exclusive

So, how can you push your boundaries while also maintaining a stable product that customers will want to use and recommend? Well, current technologies have never made this more possible than it is today. The availability of cloud infrastructures alone, have had a tremendous impact on this.

Continuous Delivery

This is a pretty controversial subject these days and I believe that is because too many people are concentrating on the  wrong parts of the practice and/or conflating it with Continuous Deployment. Continuous Delivery is, in fact, about reducing risk. The following are the central tenets. If followed properly, you can deliver software changes and bug fixes more quickly and reliably than you would with traditional processes.

  1. Test Driven – All development should be done with a given, pre-determined outcome. You cannot move-on until the tests pass. AND YOU DON’T FIX THE TEST! All too often if a test is not passing a developer will change the test to match the behavior of the code. This should not happen if your test expectations are well understood ahead of time. (Which they should be.)
  2. Automated the build – Every single check-in to source control should trigger a build of the entire systems’ code. If the build does not succeed, further commits should be blocked until the breaking commit is fixed or reverted. And I am a believer in the old adage “If you break the build, you don’t go home until it is resolved.”
  3. Automated testing – In addition, each check-in to should result in a full suite of tests (preferably integration tests) being triggered to guarantee that no changes have been made that effect other parts of the system.
  4. Automated acceptance testing – Once a feature or bug fix is ready for deployment, you can automate the acceptance testing to verify that all pre-determined acceptance criteria has been met.
  5. Automated deployments – There are a number of options to automate deployments such as puppet, chef, fabric, etc. With any of these options it is easy to deploy, verify the deployment, rollback if necessary. This greatly reduces risk in deployments. Particularly when coupled with options such as A/B deploys.
With the above pieces in-place, it opens the door to so many options that will reduce the friction of getting code to production while ensuring greater overall stability. Without the above pieces, I do not recommend trying to go further.
Another point that I made earlier is that people often conflate Continuous Deployment with Continuous Delivery. While very similar and holding many of the same tenets. There are some significant differences that make them fairly unique and, in my mind at least, makes Continuous Deployment only an option for a few organizations such as the UX group at Facebook. The key difference between the two in my mind is that Continuous Delivery focuses on the fact that all code must be deployable at any time. The steps outlined above can help you achieve that. Continuous Deployment, on the other hand, means that every change goes through the entire pipeline and is immediately deployed to production. (See Martin Fowler’s excellent write-up here.)
While Continuous Delivery is not about constantly pushing code changes, it is important to keep your deployments fairly frequent to keep the scope of the changes being deployed small. This is remarkably effective in isolating issues when they do occur. If you wait too long to deploy, you end up with, essentially, a lot of small changes. Then, if an issue arises, you need to decompose the release to attempt to find the problem. If you release one small change at a time, you can clearly see if an issue arises and track it back.



Continuous Delivery processes open-up the floodgates to all sorts of possibilities that will facilitate innovation alongside stability.



A/B Deploys

Today you can deploy changes to a whole new instance, side-by-side with an existing, stable, instance and run a comprehensive suite of tests against this instance. If it proves to be stable, you can then switch off your existing instance and direct traffic to the new instance. If you keep the initial instance available the cutover on the occasion of a failure is quick and can even be automated itself.

Canary Deploys

You can also take this a step further. If you choose to, you can selectively drive live, production traffic to the new instance while the majority of your traffic is still going to the main stable instance.

Interesting video on the above subject from etsy.

What Can Be Done By Us In The Trenches?

If you happen to be one of the many that are stuck in the trenches feeling powerless to affect change in your organization, have faith. In most cases you can still take the steps necessary to prepare for Continuous Delivery. Put in-place all of the pieces and use them while still being required to follow traditional release processes. If you do so, and meticulously track your successes, you can build your argument to management. Nothing speaks better than data. If you can present irrefutable proof that your process works, consistently. Results in fewer bugs. Makes releases quicker and rollbacks less frequent. It will “get through”. And this will build trust which leads management to allow you to take more risks with the knowledge that you have diligently created processes and checkpoints (automated in most cases to reduce friction) that greatly reduce or even eliminate client impact. Small wins lead to great achievements.

In Summary

I do what I do because I truly believe that technology can, and will, make the world a better place. Technologist wield great power. Probably mores today than ever in history. However, there is great responsibility that comes along with that power. So, when somebody asks to take large risks in order to make innovation happen, they must be willing and able to back those requests up with, not only why you think this innovation is warranted (many people want to innovate just for innovations sake) but also, your game plan for taking it to market in a responsible manner.

Looking at the current culture in my organization, I see that a change needs to occur to stop the hemorrhaging of talent. Not-coincidentally, we are losing most of our top talent to companies that are much smaller, mostly start-ups. Smaller companies, in many ways, have much more leeway to fail and recover than large organizations. But the key for large organizations is to realize they can do the same. Just focus on allowing small failures. Learning from them. And moving forward. If we don’t allow this to happen we will not only keep losing top talent, we will fail in our honorable mission. Which will really be a loss for everybody involved.

Tackling a new world order

Sun-Rise-On-Earth-From-Space-Wallpaper-600x300So, my company recently (within the past couple years) made a major tactical shift from the world of Microsoft-centered development to a very strict adherence to open-source technologies. Driven primarily by our rapid growth and the inability/difficulty for much of our existing MS infrastructure to grow with us. It has served us well thus far but, we have grown beyond the point where it made sense to keep paying the massive licensing fees and developing creative workarounds for scalability. (I have no doubt many would disagree with this rationale. Just my opinion.) This has caused a lot of confusion and consternation throughout the organization.

For the most part, the development and architecture groups embraced this change. Like myself, we have a crew that is interested in the best solution for the job. Regardless of the fact we have been working, almost exclusively, in the MS .NET world for years and this change means a major commitment and investment by all parties. (Those not interested have moved-on without any hard feelings, hopefully, on either side.) However, this is still a fairly steep hill to climb. It’s not that we just need to switch languages from say, C# to java. That is the easy part. There is a whole new ecosystem to adjust (or re-adjust) to. In addition, we are changing our entire system architecture from traditional, datacenter-deployed, applications to cloud-based, globally-distributed applications. This involves all of us and takes significant learning and knowledge sharing across groups as well as new development practices, QA processes and deployment mechanisms and processes.

However, all of this is far beyond the scope of this blog post. Today I just want to begin by kicking off a series regarding one subject that has come-up a lot and is becoming a pain point for teams across the organization. That is that, now that we are cutting ties with our historical datastore (MS SQL Server), what open source database should I choose for my application?

Part of this decision has been helped by the fact that we have had various members of our development and architecture teams do formal reviews of various options. Currently, we have three solutions that are accepted as zero-barrier options. (By this, I just mean that the use of them has been approved and does not require additional justification. Not that there are not barriers such as a learning-curve.)

The current options are as follows:

  • Cassandra
  • MongoDb
  • MySQL

Great, polyglot persistence. No trouble there. Just use the right store for the job. Right? Well, remember, this is a whole new world for most of us here. (And, arguably, for everybody that might read this post.) We are used to one option, MS SQL Server. We would model our data using RDBMS standards. Normalizing data as best we could while balancing relative performance. Then model our application domain objects to stored procedures that did various joins, subqueries, etc. to get the data in the exact “shape” we needed to work with.

So, now the directive comes along to change your data store with a general preference being the use of the approved NoSQL options. So, the question, predictably, arises “Why do I need to change the way I have always worked?” Well, this is a complicated question to answer. And, really, the answer is you don’t. However, as with every decision you will ever make, you have to be willing to make trade-offs if you want to stay the course. The trade-offs in the new “cloudified” world of sticking with a traditional RDBMS like MySQL vs making the shift to something like Cassandra or MongoDb, are pretty steep. But there are still use cases where it would make sense.

So, in the next few posts I plan on tackling this question as well as some others to the best of my ability. In addition, I would like to lay-out what I feel are good guidelines for choosing among the various types of databases available given a knowledge of the following:

  • The type or shape of the data you plan to store or use.
  • The volume of the data. (How big is the data? Either individual items or number of items.)
  • How you plan to use the data.

Other ideas will likely creep-in as I tend to stray off-topic from time-to-time. But, I will attempt to stay focused and do my best to add value to this overall discussion both for my organization and to those out there that might be going through similar transitions at their work.

I hope this series is informative. Please add responses or ask questions as I go. I have a thick skin and can handle criticism. I tend to learn the most from those that don’t agree with me.

Thoughts on Gluecon 2014

Just trying to digest all of the great conversation from this years Gluecon in Denver. Overall, I’d say it was a great success. Lots of interesting topics covered, not surprisingly, centered on the cloud, big data, DevOps and APIs. However discussions went well-beyond the standard discussions around high-level concepts, or “Use this cool new tool. It will make you the hit of the party!”

There were certainly a broad range of tools and products on display but, what I found (maybe naively?) was that, for the most part the talks were very well-vetted by the hosts to limit marketing spiel and offer really pertinent content to help us practitioners do our jobs in the best, most open, manner possible.

Some particularly strong takeaways for me were the following:

API design is not something you can fudge any longer. It takes serious thought. Real top-down thought. Ahead of time. You must think of how you want your API to be shaped. Meaning, start with your clients. Architects and developers have to start by thinking “If I were a client, with no knowledge of the implementation, how would I expect to interact with it?”

For example, let’s say we have a some content, say books, that we want to offer an API for. The first thing you should think is “Who are the clients of my API likely to be?” Depending on your product, your clients may be web and app developers in your company or some other company that wants to use our API to offer content to their customers or both. Either way, your client is likely to be a web or app developer. So, now that you know this, start designing your API to their needs. If you are lucky enough to know your potential clients, USE THAT ADVANTAGE!

More on this topic in future posts…

Building for scale has never been easier. Or more challenging. Sound like I’m confused? Well, maybe. But, what I mean is that the tools to build highly scalable systems have never been so available to us developers and architects. There was a time, not too long ago, that to build an application to handle the type of load that APIs like those from Twitter, Facebook, and many others are seeing  these days, you had to, typically, overbuild up-front. Over-provision hardware (even if just reserving rackspace and creating standard hardware specs to hasten hardware delivery time), shard your database of choice from the get-go (or at least think logically how you might), build complicated threading and synchronization logic into your code, etc. etc.

Now, while you still need to consider these things up-front, you have choices to ease the burden. Obviously, choosing a hosted cloud solution like aws, rackspace, azure, etc. is, at least in my humble opinion, a no-brainer. At least for most organizations that don’t have the resources of a google or Microsoft. With this decision made, you can start focusing on your app. And in that realm there are more choices than ever as well. From brilliant, auto-scaling, sharding, replicating databases like cassandra or riak (and others), right down to the languages you use. Java 8 comes with new, improved features like completable futures and the new, improved stream class. Then you have options like scala, node.js, etc. Take your pick.

But, this plethora of options also leads to the second part of my statement that this has also never been a more challenging time to build scalable apps. First, you have more to evaluate, and thus learn. Don’t get me wrong. The constant change of this field is the reason I got into it in the first place. I thrive on change. But not everybody does. Even on a given, hand-selected, team you are likely to have dissenters and individuals digging their heels in the dirt. Image how THIS concept scales to large teams and organizations.

That said, I see this as an exciting time of change and progress for our industry. And I/we can’t convince everybody of this. So, get on-board or get out of the way!

Deploying applications to the cloud must be quick, repeatable and predictable. Containers are the future. Learn the concepts. Pick a tool or tools and learn that/them. Then, when (not if) things change. You’ll be better prepared for it. That’s it. (Partially because this is an area I’m admittedly weak in myself.)

API SDKs suck! Ok, so I actually do not buy into this, but it was a very common theme (both sides well-represented) at this years Gluecon. Thanks mostly to an excellent day one keynote by John Sheehan at Runscope “API SDKs Will Ruin Your Life”.

Like I said, I don’t completely agree with this assertion. But, to be honest, I don’t think John does either. However, he and others made some good points. One that hit particularly close-to-home for me was the double-edged sword of how the SDK abstracts developer’s from the actual interface of an API. This abstraction eases adoption by you APIs clients. That is a VERY GOOD thing! However, as John stated, the vast majority of issues that occur with API integrations are “on the wire”. Meaning, more or less, something is wrong with the request or response. However, if you abstract this interaction from your clients, all they know is “my request did not succeed”. More API savvy developers may take the next step of inspecting the request/response before contacting you. But, if they do, barring an obvious issue like malformed requests or forgetting to pass auth, they will likely jus be faced with an unintelligible error message of some sort.

So, my counter to this argument is three-fold. Document you APIs well. Be it the old-fashioned way by manually producing help docs or with something, in my opinion, infinitely better like swagger. Just do it! It will save you many headaches in the future. Secondly, back to my first point, design your APIs intelligently with your clients in mind first. If your API is easy to navigate for an average person (test it out on somebody!), it will make the interaction less painful to begin with so your API may, potentially, need less abstraction by the SDK. Lastly I say you must strive to make the errors your API returns just as comprehensible as the non-errors. By this I mean things like returning proper error codes and human-readable descriptions. Not just a generic 400 with “Bad Request” or what have you. I know all-too-well this is hard to do up-front. You can’t predict all the ways requests may fail. But, if you try, you can think of the more common ones and handle them more elegantly. You are likely coding defensively against them to prevent failures on your end anyway. For those that arise after the fact, adapt. That is why you have that rapid, repeatable, deploy process mentioned above.

Summary

Ok, so I have rambled-on waaaayyyy too long and have not even come close to covering the above topics let alone concepts that piqued my interest but need more research on my part to speak to like cluster management with yarn or mesos. But suffice it to say, this is one of the most relevant, content-packed conferences going for the more technical audience. If you missed it this year, I highly recommend searching for the content and discussions sure to be posted in the coming days. And, see if you can make it next year. It will pay-off in spades.

Links

Excellent list of links to this years notes and presentations online provided by James Higginbotham.

http://launchany.com/gluecon-2014/