home
table of contents
Math & Computers
June 2001
email

Indentation as a block delimiter


This is a transcript of a discussion of the merits and demerits of using indentation to recognize block structure rather than delimiters such as begin/end or {}. The comments began with a posting by Paladin Mu in the comp.lang.misc newsgroup in the month of May, 1994.


Paladin Mu

Maybe it's just coincidence, but recently I encountered a lot of languages which recognize blocks using spacing blanks, instead of the old begin--end method. These languages includes ABC, Haskell, and Python.

I personally consider it a poor design to make the meaning of programs differ because of the number of spacebar pressed before each line. I'm curious in the advantage of this method?

I feel it a topic easily starting a flamewar. :) So, only the advantages please, unless you're sure you have to point it out... :)

Richard Harter

Yep, it's a good flamewar topic, although I haven't seen it in comp.lang.misc. For reasons that are obscure to me it is the people who object to indentation controlled blocking who do most of the flaming. Go figure.

Be that as it may, I will run down some of the pros (and cons). A number of remarks will be in the context of Lakota, a scripting language we developed, which does use indentation for blocking.

The first pro argument, which is theoretical, is that using indentation for blocking is "the right thing to do" because of the single point of control criterion for design. For example, if a constant is used in several places, it should only be defined once. In languages which used begin/end or {} for blocking the structure is shown in two different ways. One method, which is used by the compiler or interpreter, are the formal delimiters. The other, which is used by the readers and writers of the code, is the indentation. If we accept that code should be correctly indented, then the begin/end delimiters are superfluous. At best they merely match the visual indentation; at worst they induce errors of interpretation because the visual structure does not match the code as interpreted by the compiler.

Although, IMO, this argument is entirely correct, I have to say that it does not have much force, at least with good programming. Problems arising from mismatches of visual blocking and delimiter blocking simply don't happen very often and are usually no more than an annoyance when they do occur. [An exception is PL/I and other languages which provide multiple closure of nested blocks with a single delimiter.

A second pro argument, which IMO has much greater import is that it forces correct coding style. If you *must* indent your code properly to reflect the flow control, then it is guaranteed that your code will be properly indented. IMO this is highly desirable. Today I think that there is a consensus that correctly indented structured code is preferable as a matter of programming style. And one can argue, and I agree, that good coding style should be, by default, obligatory rather than happenstance.

Peter da Silva

I don't agree here. It forces a certain indentation, which is only the beginning of good coding style, and even then one can write misleading code. Personally, I use beautifiers during the original code development, but I will deliberately break strict indentation when I feel that is needed to properly describe the semantics of the code or to make blocks of text readable:

        { int tmp_var
        code using tmp_var
        }



        {
                ...
                {
                        ...
                        {
                                char *a =
"This block of text contains some very long lines. For whatever reason\n"
"it's not something I want to hide off at the top level (perhaps it\n"
"references something in nearby code that makes me want to edit the code\n"
"and the text at the same time.\n";
                        }
                }
        }

In general, I don't buy the argument that "strict indentation" and "good coding style" are equivalent statements, andt that's the *strongest* argument for using an indentation-structured language.

Richard Harter

Oh, I don't think that anyone is arguing that strict indentation and good coding style are equivalent statements. Clearly there is much more to good coding style than indented code. The argument is that proper indentation is an element of good coding style and that, in general, it is desirable that good coding style be obligatory.

The point that I feel that you neglect is that uniformity of style is, in itself, an important element of good coding style. If you have ten programmers you have ten styles, each with their personal layout policies. Some use code beautifiers, some do not. Some are sloppy, some are anal retentive. Your personal coding style, no matter how good, is not what most other people are doing. And many people do some really awful things.

A second point which I feel that you neglect, which I alluded to, is that using indentation has a price in that the rest of the structure of the language has to fit with it. The lesson from your examples is not that indentation should have exceptions, but that in a language like C it would be a bad thing to replace {} by indentation.

Pretty printer for C is an oxymoron.

On the con side it's a source of all sorts of subtle coding errors. For example:
        code
            more code
                even more code
                ... lots of code
             one line of code at a previous level
                and some more code

In a substantial program, it's hard to tell where a statement falls... even if one's anal about indentation and one's got begin-end as an extra check.

Richard Harter

This argument is often advanced. There are a couple of points here. The first is that the frequency of the problem is overestimated. In practice it doesn't come up that often, even in badly designed languages. However it is a live issue and there are things that can be done about it. As part of the language spec you should have alignment rules and nesting analysis. Your example is probably a syntax error, depending on what's in those lines of code.

A separate issue is that (all things being equal) it is bad style to write long procedures. Whether it is convenient and natural to write short procedures is not just a function of programmer style, it is also a function by the structure of the language.

I think a certain amount of redundancy is generally a good thing in a language design. I don't like languages with implicit typing of variables either (note that this is NOT the same as a language with untyped variables!).

I think your cut-and-paste and too-much-work arguments fall under the title of "straw men". Especially when you have good quality automatic pretty-printer programs.

Richard Harter

I have to disagree. I've seen too much bad and buggy code produced by sloppy cut and paste. There are a lot of programmers who have lower standards than you do.
An argument which is sometimes advanced against indentation as the block control mechanism is that it makes cutting and pasting code less reliable. In C, if I have a block of code {...} I can copy it into any place that is legal without worrying about impacting nesting level. I do not have the same assurance in a language that uses indentation -- I may have to re-indent the pasted code.

This argument is a two edged sword, because what proponents of liberal cutting and pasting are saying is that they want the liberty to safely construct improperly indented code. And, in fact, when you find code that is confusingly indented, the reason often is that it was constructed with a liberal use of cut and paste -- without proper cleanup afterwards.

Barry Margolin

These days many people use editors that are able to fix up the indentation. Thus, they can construct improperly indented code while pasting, but then a single keystroke can fix it up.

Richard Harter

This is a dubious argument. Most people don't do this either because they don't care to or don't know how to get their editor to do it. An argument that says "well X isn't a problem because you can use tool Y" doesn't mean much if most people don't use tool Y.
This is entirely due to the redundancy that results from using something other than indentation as the true indicator of program structure. The editor can do a syntactic analysis of the program and determine the correct indentation.

Richard Harter

Actually, you can do the same thing from the indentation alone (although it takes a fairly fancy emacs macro.) The technique is obvious to the most casual observer and will be left as an exercise for the reader.

A related argument is that ensuring proper indentation is more work. I don't have much sympathy for this argument. I will grant that one doesn't want to spend large amounts of labor on "pretty-printing", but IMO some effort towards writing clean code is warranted.

Barry Margolin

But if you screw up the indentation, there may be no cues to indicate what you actually intended. With redundant structure indicators, the mismatch can indicate where the programmer's intent was not realized. It can often be useful to run a program through a pretty-printer and compare the result with the original. This is particularly useful for recognizing mismatched else-clauses.

Richard Harter

Having worked both sides of the fence, I have to say that this is a bunch of hooey. If as much of 5% of C programmers use pretty printers I would be very much surprised. In fact I would be surprised if 5% of the code in any language ever goes through a pretty printer. More to the point, if you screw up the indentation, the evidence is right there in front of you. In my experience this is not a problem.

C is particularly treacherous because of the preprocessor. For example,


        if (x>0) FOO(x);
        else     BAR(x);

looks fine until you read the definitions and see
        #define FOO(x) if (flag) printf("x is okay\n")
        #define BAR(x) error_handler(x)
This isn't an indentation/{} issue, per se, just an example of how a dangling else can sneak in unexpectedly. [Yes, an experienced programmer who has been bitten a few times wraps those macro calls which is to the point -- there are a lot of legal practices which are inadvisable.]

Another argument which is sometimes advanced against blocking by indentation is that you can't apply automated tools against it. The most notable example are "smart" editors which automatically locate previous and trailing delimiters. It turns out, however, that most smart editors (e.g. emacs) actually can deal with indented code quite nicely. Mostly this argument is merely a matter of insisting on using inappropriate tools.

Barry Margolin

Another class of automated tools that is somewhat harder to use with indentation-based languages are comparison programs. They often have an option to ignore leading whitespace. This way, if a program of the form:
    statement1
    ...
    statement3


is changed to


    if (condition) {
        statement1
        ...
        statement3
    }
the difference listing will just show the addition of the conditional, without indicating changes to all the statements contained in the block. Yes, language-specific comparison tools could be implemented, but it's a simple fact that there aren't many of them; economics suggests that there will only be many language-specific tools for popular languages, and that currently means the C family.

Richard Harter

This is an argument with some cogency; however it is more of an argument for using the currently most popular language, than one for the issue at hand.
A substantive argument against using indentation is that visual appearance does not necessarily transfer from one context to another.

Peter da Silva

No kidding. I dynamically adjust my tab settings as I program to allow for deeper and shallower nesting in certain contexts.

Richard Harter

It sounds like a dubious practice.
The main problem arises with tabs and spaces. If we only permit spaces (or tabs) for indentation there is no problem. If we permit both then we are at the mercy of the tab settings. [Lakota does permit both and uses default tab settings for resolution; this is probably a design error although it is very convenient. The issue is under desultory review.] A real problem occurs when we deal with code transferred to (or written on) machines which use word processors. [We move code from UNIX machines to macintoshes for documentation; we sometimes get surprises.]

A pro argument is that using indentation eliminates an entire class of religious wars, namely the proper placement of the begin/end or {} delimiters. In effect, a suite of meaningless style alternatives and the arguments pro and con for them has been eliminated on the grounds that all of them are superfluous.

Another pro argument is that using indentation eliminates trailing end statements. Look at any C program and you will see lines of code that read like

                }
            }
        }
    }

Peter da Silva

Those can be avoided as a matter of style. As a matter of coding style I choose to leave them in. I even PUT extra ones in for single statements so I get these tags in place. This is a matter of coding style choices, not an advantage for indentation structured languages.

Richard Harter

Au contraire. Primus, almost everyone does the same thing. Secundus the elimination of meaningless style variations is an advantage.

and the like. This is uninformative white space with a vengeance. Block delimiters are much like serifs in letters. One can indeed argue that serifs make the letters more readable; however there are limits - serifs an inch wide are overkill.

Barry Margolin

Of course, there's no reason why each delimiter has to be on its own line; the above is simply the prevailing style. I think it may be because C programmers adopted much of the style of Pascal, PL/I, and Algol, which use keywords (e.g. "begin" and "end") as the delimiters; since these delimiters are longer and look almost like short statements, it seemed appropriate to put them on their own lines.

Richard Harter

[In PL/I, "begin" and "end" are statements.] There is a good reason for putting them on their own lines -- the blocks are visually closed as well as being textually delimited.

Most Lisp programmers put all their close parentheses on the same line. I cringe whenever I see code that looks like:

(let (
      (a 1)
      (b 2)
     )
  (print a)
  (print b)
)


as opposed to the more common


(let ((a 1)
      (b 2))
  (print a)
  (print b))

As a personal note, when I was first writing signifigant amounts of Lakota code I found that I missed the begin/end delimiters. For a week or so I actually put in dummy procedures called begin and end so that I could retain the familiar sequence of trailing end statements. This didn't last very long -- one quickly becomes comfortable at doing without. However it is true that putting in the delimiters is a engrained habit and that it takes a period of adjustment to be comfortable at doing without. Sort of like carrying a ten pound weight around for years and then getting used to not carrying it.

Another advantage of using indentation for block delimiting is that it eliminates the infamous dangling-else problem. In pseudo code this looks like:

        if (foo)
                if (bar)
                        do-something-wierd
        else bite-the-bag

Does the "else" pair with the first or second "if"? There are various tactics for resolving the ambiguity. We can adopt an end-if statement. We can insist that "if bodies" be fully delimited. We can adopt a mechanical resolution rule [a source of confusion when indentation doesn't match the rule.] The use of indentation for block delimiting, in effect, takes the path of requiring that all blocks be fully delimited.

Peter da Silva

In practice it's a non-problem. People keep bringing it up, as if it's a major problem, but it's about as significant as the cut-and-paste and too-much-trouble-to-indent ones.

Richard Harter

Agreed that it's a non-problem -- I should have noted that it is more of a theoretical issue than a practical problem. And, of course, I disagree about cut-and-paste and too-much-trouble-to-indent.

Now let's look at a serious technical issue, namely line length. There are several variants on the blocking as indentation theme. Lakota uses a pure strategy -- each line is a single command (no semicolon separator or terminator) with no continuation of a statement onto the next line. Given this and the general consideration that lines longer than a window width are undesirable, this implies that Lakota is designed so that it is easy to write code in a style where all statements fit on a line. In turn this has a lot of implications about the structure of the language. For example, it implies that deep nesting is seldom necessary. It implies that long compound statements are rare. It implies that composition can be done vertically over multiple statements rather than horizontally in a single statement. In short, adopting a pure strategy has a signifigant impact on the necessary structure of the language and the way programs in the language are written -- it is not simply an ad hoc device.

There are compromises. The principle compromise is to allow statement continuation to be signalled by an unsatisfied operator. For example

        a := x +
             y +
             z
If I am not mistaken Haskell lets you do this. In this case indentation serves both for block delimiting and for breaking up statements. I have to say that I think that overloading indentation in this way is a dubious proposition. However some people love it.

If I recall correctly, Icon makes the statement separator (;) optional. Thus, for example,

        a = b; c =d
        x = y

From what I am told, people who program in Icon find this style quite comfortable.

Another issue is detecting end of block. Here Python and Lakota will serve as good case examples. In Python (I don't have my Python manual at hand so I am going to pseudo code the example) one does this:


        some-kind-of-loop-construct
                statement-1
                .....
                statement-n
        statement-after-the-loop


In Lakota one does this:


        some-kind-of-loop-construct
                statement-1
                .....
                statement-n
                cycle
        statement-after-the-loop

What is going on here? The difference is that Python can do a "read ahead" to detect end-of-block and Lakota cannot. Python "compiles" a script file, i.e. it reads the entire script file before executing it [or behaves as though it does.] Lakota is a shell (and can be used as an interactive shell or even as a login shell). When we reach "statement-after-the-loop" we have already exited the loop block; the loop is over and done with. Thus we need the "cycle" command to repeat the loop body. Note that this is not a delimiter; it is an actual command.

Interactive languages which have begin/end delimiters do not need an explicit cycle statement because the trailing delimiter serves double duty as the loop repeat statement. One can construct pro and con arguments about whether this is a "good thing".

A con argument that is sometimes raised is that block delimiting via indentation makes it more difficult to write a compiler for the language. This is clearly not true for languages which use a pure strategy since a statement decomposition with the appropriate nesting structure is trivial. However I understand that it is a real problem in languages which overload indentation. It does seem to be the case that almost all languages which use indentation are interpreted; however this may simply be that contexts in which indentation is workable are usually contexts in which interpreted languages are appropriate.

Having been on both sides of the fence, i.e. having written signifigant amounts of code in both styles (and in antique unstructured FORTRAN) the real advantage of indented blocking is that it is simpler to write -- the begin/end delimiters are just clutter. The real disadvantage is that it doesn't deal well with long statements.


This page was last updated June 3, 2001.

home
table of contents
Math & Computers
June 2001
email