home
table of contents
guests
Computer Science
April 2003
email

Programming in the large

In comp.lang.misc Joachim Durchholz was asked to explain some of the issues and tradeoffs when devising languages that would be used for programming in the large. This was his reply:


Hmm… as I said, there are many open research issues here, so let me start with the caveat that neither my selection of issues nor my view of things should be taken as significant. And, I have made no attempt at sorting the issues by importance.

First thing is that you need good, selective information hiding. Without it, a program that declares N names has O(N^2) possible interactions, most of them unwanted. Information hiding is, of course, a trade-off. Be too liberal and you get an overly weak model that makes the visibility of a name dependent on decisions in other (potentially far-away) modules. Be too restrictive and programming means that you have to “fill out customs forms” whenever you want to “import” declarations for use in your module. A more modern name for information hiding would be “locality”. Code is local if you can understand its effect just by looking at the code itself, without referring to other snippets of code. There are varying degrees of locality, depending on what aspects of the code you’re trying to understand.

Second, ease of understanding code is more important than ease of writing it. (Obviously, one cannot entirely neglect ease of writing, since if you make the language hell to write, you’ll end up with nobody writing code in it.) The key is maintenance. Most code is written by large teams, and nobody understands all the code that’s running in a system. It’s important that people who don’t know some particular piece of code can quickly come to grips with it, preferrably without having to use information external to the code (external documentation tends to get out of sync, particularly if there are many developers: some of them are *always* too lazy to keep the docs up-to-date, and once too much docs are out of sync other developers neglect the documentation too because docs are out of sync anyway). Note that locality will help maintenance. Understanding code is easier if it can be understood locally.

Maintenance and locality are strong arguments in favor of immutability. The less aspects of an object can be changed, the less you have to worry about the execution history. If an object has a two-phase initialization sequence (e.g. this is so in C++ if you need a virtual function during initialization), you have to make sure that the objects are properly initialized; code that gets handed over such an object will have to check that it’s initialized (if only in an assert()). This all vanishes if the language makes sure that no object remains uninitialized, and doesn’t force a two-phase initialization on programmers like C++. (In C++, the “wrong” design decision was that objects mutate from the base type to the subtype when the various constructors are run. It’s this kind of far-reaching consequences (IOW non-locality) that makes language design an art.) If you take immutability to its extremes, nothing can ever be changed. If you wish to change the world, you write a function that returns a list of changes and let the run-time system inspect that list and execute the proper actions. If you have an interactive program, you emit a list that has a function pointer at its end; the function gets fed the next input and is expected to generation another action list. If you want to read more: action lists (particularly those that contain functions) are called “IO monads” in the functional literature.

Security is an important issue. A language that provokes buffer overflows like C isn’t very suitable in the age of trojans, viruses, and other forms of malware. The state of the art here are capability-based systems, but there may be other approaches. A capability is an ID that a program can use to access functionality. No ID, no access. Capabilities cannot be “forged”: in a C-based language, this means that every call that uses a capability is checked; in other languages, you simply don’t offer routines that create a capability from binary data. Capabilities can be used to control access both to functionality and to data. You can tie libraries to having the right capability, or (say) the file system. (Personally, I think capabilities should be used only to control data access, but in some languages, libraries are just data, so the distinction can be blurred.)

Integration is an important activity. Sticking together two modules should be easy and foolproof. This is a very open issue. The realistic alternatives that I see here are either not having a static type system at all, or having type inference (so that you don’t have to walk through all dependent modules if you change the type of a parameter from Int to Real). Smalltalk does a good job at this, but pays with increased testing overhead. (I also suspect that a typeless approach doesn’t scale well to large projects, but I haven’t worked on large projects in Smalltalk. I’m pretty sure some Smalltalker will disagree *g*.)


Copyright © 2003 by Joachim Durchholz
This page was last updated April 9, 2003.

home
table of contents
guests
Computer Science
April 2003
email