“Server side” languages in 2012 – Statistically incorrect

Yes, 2013 just began but this post is about my views on programming languages that I have reasonable amounts of professional experience in. This post was inspired by what one of my former colleagues had to say on couple of days ago. Note that it is not meant to be a rebuttal, it just happens to be the tipping point that got me writing.

I come from a very varies lineage: first few years of C & C++, then lots of perl & php, followed by Java & python. Here is how I feel about those languages now.

Perl

Perl is a much better bash. The shell scripting utilities (which is not the just shell itself) never managed to operate seamlessly with say an ftp resource, a http resource etc. etc. and also never managed to have tools to work with json, xml, yaml etc. etc. So if I had to write a for loop to fetch a bunch of urls, parse the json, then do a “grep” on it, I’d be coding in perl. An occasional easy access to the DB doesn’t hurt either.

However, the runtime isn’t fast enough and the general maintainability of large programs does turn out to be more complex than necessary. One of the emotional reasons that I gave up on the language was my loss of faith in the community given how long they have been dragging their feet as far as making perl 6 available.

PHP

PHP as a full blown programming language is plain horrible. What was originally meant to be a templating language alone got so may things glued via duct tape on it that it is very hard to find an underlying philosophy besides “someone added a feature in isolation overnight and never looked back”.

Python

Python really has turned out to be by favourite general purpose scripting language. It seems to have very few surprises (try figuring out what expressions evaluate to false in perl and you shall be shocked) by having some minimal restrictions in place like dynamic + strong typing. Not having to worry about integer overflows, having access to generator functions etc. etc. makes it modern enough for me. The maintainability side of things also looks reasonable given that code organization aids in retaining sanity, the “import as” directives provides nice ways to swap in different implementations without having to go through the rituals of programming to an interface.

The scary part of python however is that it is the most insecure language I’ve seen to date. Entire functions or even libraries can be monkey patched. What this means that someone could have replace the function that writes to a local file with a function that wipes off your hard disk and you can’t prevent it. And then there is the fact that there are no constants, you are supposed to be a good child and not try and update a constant.

C

This language was brilliant and awesome for almost 3 decades but it just doesn’t belong to this age. It is an extremely fragile language to program with. Contrary to popular opinion, the lack of automatic garbage collection is not what I consider as the source of fragility. Anyone who spent their time understanding the true language standards and also the contracts of the standard libraries knows that undefined behaviour is the norm and not an exception.

One source of fragility is around function signatures. The language was designed in the days where programmers always followed the gentleman’s code of honour and never changed the signature once it was published. Not honouring this code results in silent memory corruption that reflects neither immediately when it happens nor even in the same code locality where it occurs. A lot of folks might write this problem off as “fire the programmer” but it is not simple when you are relying on an intricate chain of say fifty different libraries. A modern gvim has more dependencies than that.

A more credible and painful version of the problem is around managing definition of structs. Re-ordering of fields is surely asking for trouble. Adding a new field is also asking for trouble. These two operations unlike deleting a field or redeclaring a type are semantically safe as far as existing code is concerned. However, doing either of these safe operations actually requires you to recompile almost every piece of code that uses the struct as the memory layout has changed.

A translation unit of compilation in almost all languages, C included (pun intended) is at a file level and not an application level. A general classification of the problems is that way too much assumptions and commitments are made in the compilation phase that really should be deferred to either the linking phase or the loading phase.

The lack of easy access to stack traces in case of non-fatal errors (usually accomplished by exception propagation + catching & logging at higher levels) another non-modern aspect of the language. Again, this decision was fine decades ago when there was no expectation of using multiple vendors’ libraries in your code and being open about it and saying hey, I used someone else’s code and that code has issues, so go bug them. Open source didn’t exist in 1970s and so it was fine back in the day.

People love C as it does exactly what you ask it to and you can ask it to do quite a lot of esoteric things. While all of that is great, it has rightfully fallen out of vogue as it always requires to say what needs to be done in excruciating levels of detail. All said & done, it still remains the language of choice if you are concerned about memory layout and want to have great control over it.

C++

Contrary to popular opinion, writing decent C++ on large projects is actually much harder than writing C programs. A good example of what it takes to do so can be found by looking at Google’s C++ style guide. People have often criticized it for being too restrictive but this level of clamp down is actually needed to remain sane for large code bases. If you think this is a one off situation, have a look at what the KDE guidelines look like. Their bread & butter is the d-ptr (a.k.a. p-impl idiom) which is clever but painful.

The compiler implementation story was outright shoddy for a decade wherein each vendor had a very different way of doing things and even different versions from the same vendor had very different language level support. The amount of #ifdef magic needed for a reasonable portable source code was horrendous. Enough people made a living just to get the code to compile faster or compile on more than one platform/vendor.

It was a good learning language just like C in that it helped one understand the underlying implementations an OO language but I believe its utility now is largely historical.

Java

Java is one of those languages that I’d despised for the larger part of previous decade (2000-2009) but I am now a lot more open to. People confuse Java the programming language, JVM the runtime, J2EE as defined by Sun & popular Java based frameworks & libraries. I am going to stick solely to Java the programming language in this section.

As a languages, Java is probably the most well thought out & unambiguous language for complex environments. Here is are a list of things that it supports at a core language level and was the first mainstream language to do so in most cases

Ability to mix trusted & untrusted code in the same application and manage it via security permissions.
A credible sandbox implemented at the loader level that lets you have some code run in shared space and some in private space within the same process.
A metrics collection framework that lets you expose it from any running instance of the program
A portable way to extend the language. APT acts as a compiler extension code that is supposed to work with any compiler that supports JLS5 or above
Reflection that isn’t dog slow
Acknowledging concurrency needs from the very first day and having enough language constructs for it (though bulk of the nicer libraries didn’t become a standard until the release of J2SE 5)
Super clear definition of the memory model behaviour under concurrency
Standardized library artifacts that could be reliably decoded for further inspection
Acknowledging the need to have a persistable, byte-stream representation of objects.
Support for desktop application development as part of the core library (a big deal back in the day, not so much now)

Much like how I said, I don’t hate C for manual memory management, I actually don’t like Java for automatic garbage collection. At least, that doesn’t make it into my top 5 list. The language has the static type safety of C, the ability to expand your structs in a semantically safe way without a full recompile and the python like runtime safety of validating all cross file signature assumptions on first invocation.

The place where it gets a rap on the knuckles is that it is a highly idiomatic world to program in. A lot of these idioms feel like elaborate ceremonies whose sole purpose is to increase SLoC. These idioms have more of declarative feel as opposed to an imperative feel. This is the reason why even C programmers find it annoying although the real lines of code needed to get almost anything done in C is actually higher than it is in Java. The problem gets magnified by the plethora of poorly trained programmers and frameworks designed to cater to their needs whose belief is that the more ceremonious the code feels, the better is the quality of code.

My take is that if you have to support a vast project and you have a few level headed technical people at the top, then Java isn’t a terrible choice. But it is clearly no substitute for any of the p* languages in areas where they are known to shine.

Sidestep: The JVM

If you are running some sort of an interpreted language (anything besides, C & C++ in list above), then one must look at the characteristics of the interpreter in addition to the language to understand the performance traits of running a program that is written in a certain language. While I’d have to say a few things about the interpreters of all the p* languages, I’d just gloss over them by saying none of the default implementations are any good when it comes to performance. They completely miss the baseline that is C.

The JVM in recent years has matured to be scarily fast. While everyone knows the theoretical possibility of interpreters also acting as profilers that can recompile a parts of running programming based on the profile data, it has historically proved to be a very hard problem to solve as most naïve implementations have the problem of the profiling and recompilation costs exceeding the benefits. The JVM and the CLR runtimes seem to be the only two credible runtimes that have a reasonable handle on the JIT compiler.

Going back to the performance baseline that is C, the whole cycle of profiling and recompiling with profiler data is incredibly hard both from an effort perspective and also what it takes to get it right.

All said and done, the JVM imposes 2 taxes on the running program that still remain a cause for concern. Both of them are a side-effect of having to support automatic garbage collection.

The actual cost of running a GC is non-trivial. Getting it run at the right intervals and then getting it to back-off after partial progress is still a matter trial & error even to those who are well versed in this art
There is an additional level of memory indirection that happens way more than C (but is no more than any language with automatic GC) is unavoidable. The JVM again shines here by in-lining where possible through a technique known as escape analysis

A positive side effect of the GC is that memory compaction can be done much more effectively. jemalloc is still in its infancy relatively speaking.

The most unpleasant surprise however is that some of the idioms entrenched into the larger java programming community are actually antagonistic to the JVM. Some level of education is needed to retrain them to ensure that they don’t disable the JIT.

In conclusion

Nothing new here. There is no one true programming language to rule them all. There is no clear way to answer of C or Java is faster. And everything else you know already. This post was merely meant to document how I feel about various languages and that is what you get here.