Indentation as a block delimiter
This is a transcript of a discussion of the merits and demerits of using indentation to recognize block structure rather than delimiters such as begin/end or {}. The comments began with a posting by Paladin Mu in the comp.lang.misc newsgroup in the month of May, 1994.
Paladin MuMaybe it’s just coincidence, but recently I encountered a lot of languages which recognize blocks using spacing blanks, instead of the old begin–end method. These languages includes ABC, Haskell, and Python.I personally consider it a poor design to make the meaning of programs differ because of the number of spacebar pressed before each line. I’m curious in the advantage of this method? I feel it a topic easily starting a flamewar. 🙂 So, only the advantages please, unless you’re sure you have to point it out… 🙂 Richard HarterYep, it’s a good flamewar topic, although I haven’t seen it in comp.lang.misc. For reasons that are obscure to me it is the people who object to indentation controlled blocking who do most of the flaming. Go figure.Be that as it may, I will run down some of the pros (and cons). A number of remarks will be in the context of Lakota, a scripting language we developed, which does use indentation for blocking. The first pro argument, which is theoretical, is that using indentation for blocking is “the right thing to do” because of the single point of control criterion for design. For example, if a constant is used in several places, it should only be defined once. In languages which used begin/end or {} for blocking the structure is shown in two different ways. One method, which is used by the compiler or interpreter, are the formal delimiters. The other, which is used by the readers and writers of the code, is the indentation. If we accept that code should be correctly indented, then the begin/end delimiters are superfluous. At best they merely match the visual indentation; at worst they induce errors of interpretation because the visual structure does not match the code as interpreted by the compiler. Although, IMO, this argument is entirely correct, I have to say that it does not have much force, at least with good programming. Problems arising from mismatches of visual blocking and delimiter blocking simply don’t happen very often and are usually no more than an annoyance when they do occur. [An exception is PL/I and other languages which provide multiple closure of nested blocks with a single delimiter. A second pro argument, which IMO has much greater import is that it forces correct coding style. If you *must* indent your code properly to reflect the flow control, then it is guaranteed that your code will be properly indented. IMO this is highly desirable. Today I think that there is a consensus that correctly indented structured code is preferable as a matter of programming style. And one can argue, and I agree, that good coding style should be, by default, obligatory rather than happenstance. An argument which is sometimes advanced against indentation as the block control mechanism is that it makes cutting and pasting code less reliable. In C, if I have a block of code {…} I can copy it into any place that is legal without worrying about impacting nesting level. I do not have the same assurance in a language that uses indentation — I may have to re-indent the pasted code. This argument is a two edged sword, because what proponents of liberal cutting and pasting are saying is that they want the liberty to safely construct improperly indented code. And, in fact, when you find code that is confusingly indented, the reason often is that it was constructed with a liberal use of cut and paste — without proper cleanup afterwards.
A related argument is that ensuring proper indentation is more work. I don’t have much sympathy for this argument. I will grant that one doesn’t want to spend large amounts of labor on “pretty-printing”, but IMO some effort towards writing clean code is warranted.
Another argument which is sometimes advanced against blocking by indentation is that you can’t apply automated tools against it. The most notable example are “smart” editors which automatically locate previous and trailing delimiters. It turns out, however, that most smart editors (e.g. emacs) actually can deal with indented code quite nicely. Mostly this argument is merely a matter of insisting on using inappropriate tools. A substantive argument against using indentation is that visual appearance does not necessarily transfer from one context to another. The main problem arises with tabs and spaces. If we only permit spaces (or tabs) for indentation there is no problem. If we permit both then we are at the mercy of the tab settings. [Lakota does permit both and uses default tab settings for resolution; this is probably a design error although it is very convenient. The issue is under desultory review.] A real problem occurs when we deal with code transferred to (or written on) machines which use word processors. [We move code from UNIX machines to macintoshes for documentation; we sometimes get surprises.] A pro argument is that using indentation eliminates an entire class of religious wars, namely the proper placement of the begin/end or {} delimiters. In effect, a suite of meaningless style alternatives and the arguments pro and con for them has been eliminated on the grounds that all of them are superfluous. Another pro argument is that using indentation eliminates trailing end statements. Look at any C program and you will see lines of code that read like } } } }
and the like. This is uninformative white space with a vengeance. Block delimiters are much like serifs in letters. One can indeed argue that serifs make the letters more readable; however there are limits – serifs an inch wide are overkill.
As a personal note, when I was first writing signifigant amounts of Lakota code I found that I missed the begin/end delimiters. For a week or so I actually put in dummy procedures called begin and end so that I could retain the familiar sequence of trailing end statements. This didn’t last very long — one quickly becomes comfortable at doing without. However it is true that putting in the delimiters is a engrained habit and that it takes a period of adjustment to be comfortable at doing without. Sort of like carrying a ten pound weight around for years and then getting used to not carrying it. Another advantage of using indentation for block delimiting is that it eliminates the infamous dangling-else problem. In pseudo code this looks like: if (foo) if (bar) do-something-wierd else bite-the-bagDoes the “else” pair with the first or second “if”? There are various tactics for resolving the ambiguity. We can adopt an end-if statement. We can insist that “if bodies” be fully delimited. We can adopt a mechanical resolution rule [a source of confusion when indentation doesn’t match the rule.] The use of indentation for block delimiting, in effect, takes the path of requiring that all blocks be fully delimited.
Now let’s look at a serious technical issue, namely line length. There are several variants on the blocking as indentation theme. Lakota uses a pure strategy — each line is a single command (no semicolon separator or terminator) with no continuation of a statement onto the next line. Given this and the general consideration that lines longer than a window width are undesirable, this implies that Lakota is designed so that it is easy to write code in a style where all statements fit on a line. In turn this has a lot of implications about the structure of the language. For example, it implies that deep nesting is seldom necessary. It implies that long compound statements are rare. It implies that composition can be done vertically over multiple statements rather than horizontally in a single statement. In short, adopting a pure strategy has a signifigant impact on the necessary structure of the language and the way programs in the language are written — it is not simply an ad hoc device. There are compromises. The principle compromise is to allow statement continuation to be signalled by an unsatisfied operator. For example a := x + y + zIf I am not mistaken Haskell lets you do this. In this case indentation serves both for block delimiting and for breaking up statements. I have to say that I think that overloading indentation in this way is a dubious proposition. However some people love it. If I recall correctly, Icon makes the statement separator (;) optional. Thus, for example, a = b; c =d x = yFrom what I am told, people who program in Icon find this style quite comfortable. Another issue is detecting end of block. Here Python and Lakota will serve as good case examples. In Python (I don’t have my Python manual at hand so I am going to pseudo code the example) one does this: some-kind-of-loop-construct statement-1 ..... statement-n statement-after-the-loop In Lakota one does this: some-kind-of-loop-construct statement-1 ..... statement-n cycle statement-after-the-loopWhat is going on here? The difference is that Python can do a “read ahead” to detect end-of-block and Lakota cannot. Python “compiles” a script file, i.e. it reads the entire script file before executing it [or behaves as though it does.] Lakota is a shell (and can be used as an interactive shell or even as a login shell). When we reach “statement-after-the-loop” we have already exited the loop block; the loop is over and done with. Thus we need the “cycle” command to repeat the loop body. Note that this is not a delimiter; it is an actual command. Interactive languages which have begin/end delimiters do not need an explicit cycle statement because the trailing delimiter serves double duty as the loop repeat statement. One can construct pro and con arguments about whether this is a “good thing”. A con argument that is sometimes raised is that block delimiting via indentation makes it more difficult to write a compiler for the language. This is clearly not true for languages which use a pure strategy since a statement decomposition with the appropriate nesting structure is trivial. However I understand that it is a real problem in languages which overload indentation. It does seem to be the case that almost all languages which use indentation are interpreted; however this may simply be that contexts in which indentation is workable are usually contexts in which interpreted languages are appropriate. Having been on both sides of the fence, i.e. having written signifigant amounts of code in both styles (and in antique unstructured FORTRAN) the real advantage of indented blocking is that it is simpler to write — the begin/end delimiters are just clutter. The real disadvantage is that it doesn’t deal well with long statements. This page was last updated June 3, 2001. |