Languages to learn in 2016

code

This is a blog post in response to a quora question. I started a response and it grew well beyond what I thought it would. I guess that means I have an opinion here? 😉

Question: “Which backend programming language should I learn in 2016?”

Answer:

The answer depends on your objectives. If you are looking for something to base your career on, you should go for one of the more popular, multi-purpose, OO languages:

  • Java – This would be my top choice for this realm. It has the largest community and the most job opportunities. And, if you want to work in cutting-edge OSS, it is the most common language (or at least the JVM) of choice.
  • C# – This, like java, is a great choice for employability and stability. If you are most comfortable in windows (as opposed to linux or OS X) you may want to start here. The jobs you will find on the market will be more leaning toward the enterprise than commercial. But that is still a very interesting world and one that will likely be around and in great demand for the foreseeable future. And the recent foray into open-source and cross-platform viability could open it to a new world of applications.

If you are looking to add to a toolbelt that already has a foundation in something like the above and will still add value to your career, I would say to add a growing or established language that adds a particular value.

  • go – This is, in my humble opinion, one of the key languages of the next 10 years. Partially because google is behind it and what google does, so do any companies that want to attempt to recreate their success. But, it is genuinely a great language. In my opinion, it is like C re-imagined. It is crazy-fast and cross-platform compilable. It has its warts like any language and is still developing a bit. But what is there is very good and well thought-out. And, you are able to go to your next meetup and say “Yeah, I know go.” 😉
  • node.js – Notice I didn’t say javascript? Yeah, that is because node, while the sysntax is definitely js, is a whole new paradigm. Even for experience js developers. But, due to its async nature, it fits the new world of small, light service-based architectures very well. I, personally, see it an excellent choice for microservices. (as is go) And it is being used by many large companies for just this reason.

Now, if you just want something cool that will teach you something new you have a lot of options. I am a huge proponent of learning a new language just to see a problem from a whole different angle. While there are many options in this world, I will mention a few I have played with.

  • Elixir – A ruby-ish language that is built on-top of the erlang vm (BEAM). Elixir hides much of what makes erlang difficult for people to tackle while still leveraging the immense goodness of the BEAM. I’m an erlang guy and would love to propose that everyone learn erlang. But, I’m also a pragmatist. It will be much easier for someone to tackle elixir. Have fun doing so. And still be able to built ridiculously parallel systems easily.
  • Haskell – Basically, you can choose any functional language here you like. I feel that Haskell is probably the best combination of pure functional ideologies (no side-effect methods, immutability, etc.) and a pragmatic, practical, implementation. Plus, I feel that every professional engineer should experience fp at some point in their career. Being a 15 year development veteran, I always thought I understood parallel and concurrent programming (simplistically think multi-threading). But then I began to learn erlang and a whole new world was presented to me of how systems “should” be built. And, if all the descriptions of fp describing it in terms of maths discourage you, I would suggest this post about just that.
  • Rust – Coming out of mozilla, rust is similar in many ways to go. It has an excellent memory management model/implementation. I could see rust being to go what c# is to java. Either way, could be worth a look.
  • Julia – Kind of going out on a limb with this one. Especially since I have done little more than ‘Hello World’ in julia. But there are many things to like about this language and its fresh take on many different approaches that have become de rigueur in software engineering these days. (Plus it has a repl!) If nothing else, it could be worth exploring just to force your head out of the standard box.

I hope this helps. I realize I have left out many languages that may be just as worthy. But I had to limit it somehow and I did so by only listing those I have, at least, a modicum of experience with.

 

So, you chose to go NoSQL?

So, you have surpassed the decision of whether to choose a traditional RDBMS (SQL) store and have decided NoSQL is the way to go. You may have even read my previous post To Relate Or Not when making this decision. Now what do you do?

First Of All, Why NoSQL?

Much of the chatter I hear these days is around NoSQL. “My boss/architect says we should be using NoSQL for this project. I don’t understand why.” Or, “I want to use NoQL for xyz, but I don’t even know where to start!” This is primarily because, although this is changing rapidly, it is still a fairly foreign concept to most developers and architects. To someone like myself that is fortunate (in my opinion) enough to work for a company and on a team that is constantly looking forward to what advancements are around the corner that can improve our systems and increase the efficacy of our product, it seems like it’s been around for long enough that most people should be very familiar. But, honesty, while the concept has been around more-or-less since the dawn of computing, the NoSQL buzzword came-along and lit-up the minds of developers and technologists just a few short years ago.

“So, if the RDBMS’s that we have all grown to know so well work, why do we need to introduce something new?” Well, first let’s address that question. Yes, we have all become very familiar, if not intimate, with a RDBMS over the years. We are very comfortable and know how to make them store data and how to get that data back out when we need to. However, I do take issue with the “work so well” part of that question. We have, over the years learned all kinds of tricks to make a RDBMS fit our needs. But, it often is convoluted, complicated and comes at some expense. Either literal due to the cost of scaling-up hardware to meet performance demands, or mental and emotional as in the mental gymnastics you often need to perform to understand it and implement a solution against it. And it often is both.

RDBMS’s come from a time when the thought of storing terabytes of data was unheard of. Today, that is often just the entry-point for many data-driven applications. Then, you layer on-top of that the fact that we now need to develop of systems with an eye toward a global audience, meaning global distribution, replication and reliability. We are now well out-of bounds of the original purvey of the RDBMS.

Now, don’t get me wrong, I definitely feel that relational data has great value in the business world. And, to my last point of global distribution, there have been great strides made to make that less of an issue for RDBMS’s. (See MariaDb for a nice example) However, the hurdles we have been jumping through to make them work for us in all situations is just no-longer necessary. We live in a world of persistence choices. Choose the one that fits your needs best and run with it.

So Many Choices

The NoSQL world has exploded in recent years. And you have many many choices. There are options geared toward gargantuan write speeds, lighting fast reads, scalability, reliability, just about anything. And that, in my humble opinion, is both the bane and the beauty of the NoSQL world.

Which One Should I Choose?

As I mentioned previously, you should evaluate your needs and choose the solution that fits best. Easier said than done right? Yeah, well, you’re right. Especially if you are new to the arena. So, let me share a bit of my experience and hopefully that will help.

First, let me say that this post is already going to me too long. So, I am going to narrow the scope to the two front-runners of the NoSQL world at the time of this writing. Cassandra and MongoDB. Between them, they can fit most business needs. (They also happen to be the two on my companies “approved technologies” list!)

The first question you need to ask yourself is, what does my data look like? Or, if you are working on green fields then, what, at the very minimum, do you expect your data to look like? The first question that I like to ask is “What kind of data are you storing?”

  • What is the “shape” of the data? (Contact info? Sales transactions? User activity?)
  • How many different types of data? (See first bullet-point.)
  • What volume do you anticipate? (It is usually best to overestimate here. You’ll be surprised.)
  • Do you anticipate the load to be read-intensive, write-intensive or mixed?
  • What size is the data you are storing? (By this I mean the individual bits of data.)
The answers to the above questions can get you most of the way to your chosen solution. So, let’s examine that more closely.
What kind of data are you storing?
 
Ok, so there are many, many types of data out there. But they all tend to boil-down to a few types. These are just my own buckets.
  • Reference data: Contact information, billing information, etc.
  • Transactional data: Banking, sales, tests etc.
  • Activity: User behavior
Reference data tends to be written infrequently but read often. It can be fairly complex with many different relationships (borrowing a term from the RDBMS world). You will also often need to look it up by various means. (Say, in the case of contact information, first name or telephone number, or…)
Transactional data, on the other hand, tends to be written frequently, and read less-frequently. It may be complex as in several operations make-up one transaction. But, it is fairly flat data that is, most-often, retrieved via some key like a transition number or order id.
Activity data is the new kid on the block. This type of data is often what constitutes everybody’s favorite buzzword “Big Data”. You are collecting massive amounts of data to attempt to mine it for trends. Trends that may help you present a better user experience. Or, to be completely honest, you hope that data will ultimately make you more money. this data is, almost without exception in my experience, unstructured. High volume and velocity. So, write-intensive. And read, for the most part (I’ll mention exceptions later) read in batches.
Now I Know My Data. Can I Just Pick A Solution Already?
 
Once you have identified the type or types of data you expect to be working with, you can begin to understand what kind of NoSQL solution will best fit your needs.
Below is a checklist that I often use to ell narrow this even further.
Feature Yes No
High write volume
Large number of reads
Complex queries
Large data objects
ACID transactions

Go ahead and fill-out the above spreadsheet to the best of your ability. The combination of this and your previous analysis of the type of data you expect will get you most of the way to your decision.

Evaluating The Checklist

If you are expecting your system to have a large number of writes, (this is obviously relative but I like to think first whether I expect it to be primarily recording data and reading infrequently) then you would likely be steered to cassandra. This is really cassandra’s historical “sweet spot”. You probably already know this.

On the flip-side, if you are expecting to write infrequently but read a lot, as in the case for contact information, MongoDB does have an out-of-thebox advantage here. However, as read load increases, so does read latency in MongoDB.

MongoDB will also give you an advantage when it comes to complex, dynamic queries on existing datasets. Mongo allows you to think less about the structure of your data up-front and decide how you want to retrieve that data later.

Large data objects are not really the forte of either of these databases. However, they do both have options that allow for chunking of large objects. With cassandra you have astyanax. In Mongo you have the option of going with GridFS. I have not personally used either. However, I have heard and read good things about both.

Lastly, if true ACID-compliant transactions are what you are looking for, you probably don’t want aNoSQL solution to begin with and should probably go back and read my post To Relate or Not. That said, if you are willing to loosen the reigns a bit on scrict ACIDity, either of these soltuions can provide you with a pretty high level of data consistency. And MongoDB does provide atomic transaction at a document level. [See here]

Other considerations

As I mentioned previously, while the type of data you are storing and the patterns of usage will and should be your first consideration when choosing a NoSQL solution, there really are other considerations you need to account for. Just a few examples:

  • What are your requirements for availability?
  • Do you anticipate requiring multi-dc or region replication?
  • What is your plan for maintaining your data solution(s)?

I can say from experience and the experience of my close colleagues that when it comes to high-availability, nothing currently beats cassandra. And it is the only solution that I have come-across that allows for relatively seamless cross data center replication of clusters. (Other solutions, like Riak, provide this at a cost)

One often overlooked aspect of this whole picture is the cost of maintaining your NoSQL solution. If you are just looking at a few servers in one data center or AZ, this may not be much of an issue. As you begin scaling-out, you will find this becoming more-and-more of a burden on your team. I can say that the maintenance costs of a MongoDB cluster are likely to escalate at a much greater pace. And, if you decide that you need to scale to multiple datacenters/regions, this cost can become fairly astronomical. In our case we needed to hire a dedicated team of experts as well as consultants form 10gen. As for cassandra, we are currently running several instances cross-region and zone and these are fairly easily maintained by the development teams. These clusters are closely monitored via various tools and we rarely, very rarely, have any issues that require manual intervention.

My obvious bias

By now, I’m sure you can tell that I feel that cassandra is the superior solution for most any application you plan to implement that requires the benefits of a NoSQL database. That said, I don’t want to discount how great I think that MongoDB can be. I use it frequently for quick proof-of-concepts and small internal applications that will never require the kind of scalability the majority of my work demands. Not surprisingly, I particularly enjoy working with MongoDB when writing in Node.js. They are like peanut butter and jelly. And make the creation of full-stack applications quick and painless. But, watch-out if that application turns-out to be a big hit!

Summary

To quickly summarize, both MongoDB and cassandra offer excellent solutions to different problems out-of-the box. However, I believe that given the demands of todays globally distributed world of applications, the best solution for most applications is going to be cassandra. Yes, there is going to be a bit more up-front work required. Particularly if you are writing a system that is more read-intensive than write intensive. Out-of-the-box, this is not what cassandra is designed for. However, with a little thought as to how you design you data, thinking first of how it will be accessed/queried, you can achieve great performance on both reads and writes.

Again, all of this is based entirely on my own personal experience. I work in an arena where availability, scalability and global distribution is tantamount. This may not be the case for you. Use my above evaluation tools fairly and choose what fits your needs best. However, I can say that you are unlikely to ever be sorry you chose cassandra. And you very well may be a hero for doing so.

 Some interesting related links

 

 

Thoughts on Gluecon 2014

Just trying to digest all of the great conversation from this years Gluecon in Denver. Overall, I’d say it was a great success. Lots of interesting topics covered, not surprisingly, centered on the cloud, big data, DevOps and APIs. However discussions went well-beyond the standard discussions around high-level concepts, or “Use this cool new tool. It will make you the hit of the party!”

There were certainly a broad range of tools and products on display but, what I found (maybe naively?) was that, for the most part the talks were very well-vetted by the hosts to limit marketing spiel and offer really pertinent content to help us practitioners do our jobs in the best, most open, manner possible.

Some particularly strong takeaways for me were the following:

API design is not something you can fudge any longer. It takes serious thought. Real top-down thought. Ahead of time. You must think of how you want your API to be shaped. Meaning, start with your clients. Architects and developers have to start by thinking “If I were a client, with no knowledge of the implementation, how would I expect to interact with it?”

For example, let’s say we have a some content, say books, that we want to offer an API for. The first thing you should think is “Who are the clients of my API likely to be?” Depending on your product, your clients may be web and app developers in your company or some other company that wants to use our API to offer content to their customers or both. Either way, your client is likely to be a web or app developer. So, now that you know this, start designing your API to their needs. If you are lucky enough to know your potential clients, USE THAT ADVANTAGE!

More on this topic in future posts…

Building for scale has never been easier. Or more challenging. Sound like I’m confused? Well, maybe. But, what I mean is that the tools to build highly scalable systems have never been so available to us developers and architects. There was a time, not too long ago, that to build an application to handle the type of load that APIs like those from Twitter, Facebook, and many others are seeing  these days, you had to, typically, overbuild up-front. Over-provision hardware (even if just reserving rackspace and creating standard hardware specs to hasten hardware delivery time), shard your database of choice from the get-go (or at least think logically how you might), build complicated threading and synchronization logic into your code, etc. etc.

Now, while you still need to consider these things up-front, you have choices to ease the burden. Obviously, choosing a hosted cloud solution like aws, rackspace, azure, etc. is, at least in my humble opinion, a no-brainer. At least for most organizations that don’t have the resources of a google or Microsoft. With this decision made, you can start focusing on your app. And in that realm there are more choices than ever as well. From brilliant, auto-scaling, sharding, replicating databases like cassandra or riak (and others), right down to the languages you use. Java 8 comes with new, improved features like completable futures and the new, improved stream class. Then you have options like scala, node.js, etc. Take your pick.

But, this plethora of options also leads to the second part of my statement that this has also never been a more challenging time to build scalable apps. First, you have more to evaluate, and thus learn. Don’t get me wrong. The constant change of this field is the reason I got into it in the first place. I thrive on change. But not everybody does. Even on a given, hand-selected, team you are likely to have dissenters and individuals digging their heels in the dirt. Image how THIS concept scales to large teams and organizations.

That said, I see this as an exciting time of change and progress for our industry. And I/we can’t convince everybody of this. So, get on-board or get out of the way!

Deploying applications to the cloud must be quick, repeatable and predictable. Containers are the future. Learn the concepts. Pick a tool or tools and learn that/them. Then, when (not if) things change. You’ll be better prepared for it. That’s it. (Partially because this is an area I’m admittedly weak in myself.)

API SDKs suck! Ok, so I actually do not buy into this, but it was a very common theme (both sides well-represented) at this years Gluecon. Thanks mostly to an excellent day one keynote by John Sheehan at Runscope “API SDKs Will Ruin Your Life”.

Like I said, I don’t completely agree with this assertion. But, to be honest, I don’t think John does either. However, he and others made some good points. One that hit particularly close-to-home for me was the double-edged sword of how the SDK abstracts developer’s from the actual interface of an API. This abstraction eases adoption by you APIs clients. That is a VERY GOOD thing! However, as John stated, the vast majority of issues that occur with API integrations are “on the wire”. Meaning, more or less, something is wrong with the request or response. However, if you abstract this interaction from your clients, all they know is “my request did not succeed”. More API savvy developers may take the next step of inspecting the request/response before contacting you. But, if they do, barring an obvious issue like malformed requests or forgetting to pass auth, they will likely jus be faced with an unintelligible error message of some sort.

So, my counter to this argument is three-fold. Document you APIs well. Be it the old-fashioned way by manually producing help docs or with something, in my opinion, infinitely better like swagger. Just do it! It will save you many headaches in the future. Secondly, back to my first point, design your APIs intelligently with your clients in mind first. If your API is easy to navigate for an average person (test it out on somebody!), it will make the interaction less painful to begin with so your API may, potentially, need less abstraction by the SDK. Lastly I say you must strive to make the errors your API returns just as comprehensible as the non-errors. By this I mean things like returning proper error codes and human-readable descriptions. Not just a generic 400 with “Bad Request” or what have you. I know all-too-well this is hard to do up-front. You can’t predict all the ways requests may fail. But, if you try, you can think of the more common ones and handle them more elegantly. You are likely coding defensively against them to prevent failures on your end anyway. For those that arise after the fact, adapt. That is why you have that rapid, repeatable, deploy process mentioned above.

Summary

Ok, so I have rambled-on waaaayyyy too long and have not even come close to covering the above topics let alone concepts that piqued my interest but need more research on my part to speak to like cluster management with yarn or mesos. But suffice it to say, this is one of the most relevant, content-packed conferences going for the more technical audience. If you missed it this year, I highly recommend searching for the content and discussions sure to be posted in the coming days. And, see if you can make it next year. It will pay-off in spades.

Links

Excellent list of links to this years notes and presentations online provided by James Higginbotham.

http://launchany.com/gluecon-2014/