Type safety & SLoC – Statistically incorrect

I believe that SLoC is inversely proportional to code maintenance costs. There are situations where bad code needs more lines of code than good code. The argument for reduced SLoC is trivial in this case. Interestingly, the same holds true in situations wherein good code needs more SLoC than bad code. The need for an increased SLoC to keep things nice & clean suggests that trying to “maintain” this requires vigilance over a larger code base.

It is very common to hear about the SLoC needed to achieve the same result in various programming languages. This is then used as a basis for arguments over superiority or inferiority of languages. Unfortunately, much of these conversations tend to resemble political smear campaigns where jingiosm takes precedence over rationale. So let us explore what happens in a few popular programming languages. We shall use the following entity to explore the implications

Entity: Company

name (a string)
number of employees (a positive integer)
annual gross revenue in USD (a floating point number)

C++

Strong typing (default)

struct Company {
  string name;
  unsigned employeeCount;
  double revenue;
}

This is about as good as it gets. Note that one little constraint on the number of employees is not enforced as this still permits a zero value to be assigned to employeeCount.

Ultra loose typing (uncommon, almost never done)

map < string, shared_ptr < void* > > Company

This is almost never done in practice but this essentially lets you have an open ended struct. You can extend this at will without any recompilation or worries about binary compatibility. However, the big, sharp, bleeding edge that this ends up opening is that it is now upto the programmer to remember what type of value is associated with which key type and then use it consistently. Failure to do would result in memory corruption.

Java

Strong typing + Encapsulation (default, done the stupid way)

public class Company {
    private String name;
    private int employeeCount;
    private double revenue;
 
    public String getName() {
        return name;
    }
 
    public void setName(String name) {
        this.name = name;
    }
 
    public int getEmployeeCount() {
        return employeeCount;
    }
 
    public void setEmployeeCount(int employeeCount) {
        this.employeeCount = employeeCount;
    }
 
    public double getRevenue() {
        return revenue;
    }
 
    public void setRevenue(double revenue) {
        this.revenue = revenue;
    }
 
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
 
        Company company = (Company) o;
 
        if (employeeCount != company.employeeCount) return false;
        if (Double.compare(company.revenue, revenue) != 0) return false;
        if (!name.equals(company.name)) return false;
 
        return true;
    }
 
    @Override
    public int hashCode() {
        int result;
        long temp;
        result = name.hashCode();
        result = 31 * result + employeeCount;
        temp = revenue != +0.0d ? Double.doubleToLongBits(revenue) : 0L;
        result = 31 * result + (int) (temp ^ (temp &gt;&gt;&gt; 32));
        return result;
    }
}

This has always been the poster whipping child of mindless increase in SLoC. The worst part is that the number of holes in this implementation is staggering. The employee count can be -ve in this implementation. Also, the name of company can be null (which is different from an empty string). More code can be written inside the setters to handle these but that is just an increase in SLoC.

All of this bloat is actually not enforced by the language. It is possible to match SLoC of the strongly typed example in C++ in Java too (albeit with the deficiencies pointed out earlier around poor constraints). The bloat is throw in by OOP zealots who like to encapsulate the fields of an entity behind the accessors. What they do not realize is that the language has powerful constructs that lets them have their cake and eat it too.

Strong typing + Encapsulation (default, done the smart way)

The following code is exactly equivalent from a compiler’s perspective as the above but is orders of magnitude smaller from a maintainer’s perspective.

import lombok.Data;
 
@Data
public class Company {
    private String name;
    private int employeeCount;
    private double revenue;
}

The only change is that we have used an additional annotation based source processing library lombok that tucks away the ugly repetitive code. The tragic part of course is that there is exceptionally poor awareness of the existence of such tools and that fact that it is not a part of the language implies enormous amounts of bloated code base exists.

The “C++ way” that was alluded to earlier involves declaring all the members as public and doing away with annotation for the accessors. We still would need to use another annotation (EqualsHashCode) from lombok to provide proper object equivalence tests.

Ultra loose typing (not so uncommon)

Map<String, Object> company;

The above construct is also possible which is very close what we saw in C++. The only difference is that sloppy usage errors result in exceptions being raised as opposed to silent memory corruptions. This concept is widespread in the Java ecosystem and goes by the name of context maps.

Scala

Strong typing + encapsulation

Touted as the next java, scala has the lombok equivalent built into the language. Here is what the scala code would look like

class Company {
  var name : String
  var employeeCount : Int
  var revenue : Double
}

There are accessors getting generated behind the scenes that permit manipulation of these values. Much like using lombok with java, it is possible to selectively override these accessors.

JSON

Ultra loose typing (The only way)

Though not a proper programming language by itself, the format is so widely used that it deserves a special mention. Here is how the modelling would happen in JSON

{}

This is as flexible and as uninformative as it gets.

ECMAScript/Javascript

Ultra loose typing

This is exactly what you saw in the case of JSON

{}

Weak typing, no encapsulation

It is possible to have a rather well defined struct in the language. Here is an example

function Employee() {
  this.name = '';
  this.employeeCount = 0;
  this.revenue = 0.0;
}

While the names of the member variables are locked in this case, the type of value that they can hold is surely not locked.

Weak typing, with encapsulation

function Employee() {
  var name;
  var employeeCount;
  var revenue;
 
  this.getName = function () {return name;}
  this.getEmployeeCount = function () {return employeeCount;}
  this.getRevenue = function () {return revenue;}
 
  this.setName = function (n) {name = n;}
  this.setEmployeeCount = function (e) {employeeCount = e;}
  this.setRevenue = function (r) {revenue =r;}
}

This looks very similar to the stupid java way. There probably is a smarter way to achieve the same in JS but I am not aware of it. The more important thing to see is that while we have enforced constraints on the names of member variables, the actual contents of each member is loosely typed. However, it is possible to code in the type specific checks as part of the setters in the above code. The only annoyance of course is that there are no exceptions in the language and thus reporting and handling errors becomes tricky.

Python

Ultra loose typing (default)

Python is a strange language in the sense that while it does have support for OO concepts from the groud up, a class in python is essentially completely open ended. Here is what could be declared as a class

class Company(object):
    name = ''
    employeeCount = 0
    revenue = 0.0

However, nothing stops programmers from assigning values to previously inexistent fields of objects of this class. The class behaves like a map (a.k.a dictionary in python speak) in many ways. There is no requirement to consistently use a same data type for a given field nor is there any restriction on introduction of new members at a per instance level on the fly.

Weak typing, the wrong way

The following piece of code will ensure that members cannot be dynamically added to objects of a class

class Company(object):
    __slots__ = ["name", "employeeCount", "revenue"]

The reason why this is considered an incorrect way of achieving the goal is that the motivation for the introducing the construct of slots at the language level was to use it as a memory saving hint. It however does have the side effect that we need around locking down the possible set of member variables. There are a lot of programs that go against the spirit of the language and use the feature for this purpose

Weak typing, the right way

class Company(object):
    name = ''
    employeeCount = 0
    revenue = 0.0
 
    def __getattr__(self, name):
        raise AttributeError
 
    def __setattr__(self, name, value):
        if(name not in ('name', 'employeeCount', 'revenue')):
            raise AttributeError
        else:
            object.__setattr__(self, name, value)

Strong typing (uncommon, almost never done)

class Company(object):
    _types = {'name' : str, 'employeeCount': int, 'revenue': float}
 
    def __init__(self):
        self.name = ''
        self.employeeCount = 0
        self.revenue = 0.0
 
    def __getattr__(self, key):
        raise AttributeError
 
    def __setattr__(self, key, value):
        vartype = Company._types.get(key)
        if vartype:
            if type(value) == vartype:
                object.__setattr__(self, key, value)
            else:
                raise AttributeError
        else:
            raise AttributeError

Concluding remarks

It should be obvious by now that there are multiple ways to model an entity in almost every language. However, the default practice varies widely based on the language of choice. What some people do not realize is that choice with respect to degree of freedom or control is more cultural and less to do with languages. The SLoC needed to define an entity is actually more a function of this cultural choice that one tends to make as opposed to being a fundamental property of the language.

That being said, the language in question does matter to an extent and almost all these languages have gone through multiple iterations and the emphasis on what is most elegant & concise does seem to be shaped by culture. For example, trying to have absolute control in python feels very unpythonic. Likewise, a completely open ended solution in C++ also feels equally alien.

What has completely been ignored in this post the impact of these stylistic choices on the SLoC and maintainability of code that uses these definitions of the objects. That shall be another post for another day.