home
table of contents
Math & Computer Science
November 2001
email

Thoughts on Update as a Software Maintenance Tool


This is an HTML version of a paper I presented May 28, 1981 at the conference on Advances in Software Technology sponsored by the IEEE Computer Society, IEEE Catalog No. 81CH631-1. The paper appeared in PROCEEDINGS OF TRENDS & APPLICATIONS, May 1981, pp 163-171.

The paper is of historic interest because it contains in outline form many of the concepts that underlay the Aide-de-Camp configuration management system (currently marketed by McCabe as truechange.) In some respects the paper presented CM concepts that are still in advance of current practice.

The original paper had some eccentric capitalizations and an inordinate enthusiasm for commas. The capitalizations have been retained; most but not all of the commas have been retained. A few outright typos have been fixed; I hope I have not introduced too many in their place. In addition two columns in the paper were transposed; the correct order was used in this page.


Abstract

Source code update systems are a software technology for maintaining and controlling source code. This paper examines Update systems and experience with them. It begins with one of the better systems, CDC Update. Its used in a large software project is studied. Two hybrid systems which use an editor for altering source code and Update system for maintaining it are studied. Similar ideas in Editor oriented systems are also looked at. A source code maintenance system which combines ideas from a number of the sources studied is presented. The paper concludes with the observation that, in the course of developing new software tools, we should also be looking at old ideas and tools and at field experience with them.

Introduction

Nowadays it is fashionable to be concerned about software tools. Numerous articles have appeared in the literature about specific tools and about general needs. These articles often present solutions to problems already solved in the past. There are good reasons for this. Most software professionals to not publish nor is their work generally available; hence many good ideas simply get lost. There are many different operating systems; nobody can be familiar with more than a few. Hence most authors write from a background which reflects the characteristics of the systems that they are familiar with. The publication process is biased towards problems encountered in academic settings. As a result major software technologies often simply are not recognized in the literature.

Source code Update is an example of a neglected software technology. This neglect is not too surprising; Update programs are mostly used in card-oriented, batch operating systems. Although theses systems are economically important, they are in the backwaters of modern software thought. Moreover, most Update programs are not very good. Some are excellent, however. A case in point is the CDC Update system. In my opinion, it is one of the best generally available software maintenance tools ever developed. It provides convenient solutions for many technical problems and facilitates an integrated systems approach to source code maintenance. This paper gives an overview of CDC Update that is augmented by a case study of a large system maintained using CDC Update.

The general objective of this paper is to look at good ideas for software maintenance tools. Naturally, we cannot cover all of the good ideas in one paper. We can, however, look at a class of methods and ideas. In this paper we are going to look at source code Update systems and at experience with these systems. Our objective is to identify the features and requirements that have proved to be important in practice.

In this paper I draw extensively upon personal experience with large systems. During the early 1970’s, I was a member of a software team that wrote a large, real time FORTRAN program for the CDC 6600. This program, which I will call system A, ultimately ran to 75,000 statements. It was concurrently maintained at two sites that were 7000 miles apart by separate software teams at each site. It is an excellent case example of the problems encountered in creating large systems.

Since then I have used two hybrid systems. In these systems the source code was first altered with an Editor and then maintained with an Update program. Both systems had advantages; neither was completely satisfactory. It is my belief that the future of software maintenance lies with systems that combine the best features of both Editor based systems and Update based systems. Experience with hybrid systems is particularly important; we need to know the problems that occur in practice, which features turn out to be useful, and which turn out to be unimportant.

Update Systems

Update systems are general purpose tools for making incremental changes to source code. They server the same general purpose as interactive Editors. For each the input is an old file and a set of changes; the output is a new file which reflects the changes. In most editors the set of changes is a stream of interactive edit commands that is not preserved after the edit session. In Update systems the set of changes is a file of update commands that is preserved after the Update run. In this article the file of changes with always be called an “Upfile”. Update is a two-level process; we edit the file of changes rather than the source code. Editors are convenient for altering source code but they make change control difficult because the changes themselves are thrown away. Updates systems are not as convenient as editors for altering code. On the other hand they are better for controlling change because the set of changes exists in permanent form. Historically, Update systems have been associated with card-oriented, batch operating systems and Editors have been associated with interactive timesharing systems.

Update systems and Editors refer to source code lines in different ways. Editors make changes immediately; the source code reflects the last change made. Editors use a pointer which points to the current position. To insert code at that position you simply give an insert command. If you want to insert code somewhere else you move the pointer. This scheme works for Editors because the current position is always in view.

This pointer scheme is not suitable for creating Upfiles for Update because you cannot display where you are or what you have just done – the changes do not happen until you actually make the Update run. Hence Update systems never use a position pointer. Instead they require that lines have names and that the Update commands use those names. Instead of saying “insert” we say “insert at line x.” It is important that these line names remain the same throughout the Update run; it they did not you would quickly find yourself in a hopeless muddle. This difference has some important consequences.

Generally speaking, it is difficult to develop parallel changes in Edit systems. For example, suppose that programmers A and B create separate versions of a routine. We could merge the two version to include the changes that are in the other but this is an unreliable and time consuming way to do things. Some Edit oriented systems have special tools for this task – programs which generate difference files and merge them. These tools are a great improvement; however they still require a lot of special handling. In good Update systems it is easy to develop changes in parallel. We do not need to generate difference files; we create them. We do not need a merge program to reconcile Upfiles; we can concatenate them because the Upfiles use fixed line names.

The CDC Update System

The CDC Update manual [3] gives a complete description of CDC Update. Like most manuals it is most useful if you already have a general understanding of the topic covered. This particularly true because CDC Update is a powerful and complex system that offers many options. Those who are not acquainted with the system need a simplified overview.

In CDC Update source is maintained in special files call PL files (PL = Program Library.) Basically a CDC PL file is a source code database designed for maintaining a program consisting of many routines. The code in a PL file is divided into segments called decks. Normally the source code is divided so that decks and subroutines correspond one-to-one.

Source code is altered by executing the Update facility. The inputs for the Update run are the old PL file and an Upfile containing the changes to be made. There are two output files from and Update run. One is the updated PL. The other is a source code file called the Compile file; its contents depend on the update mode selected. We can send all decks to the Compile file (full update), none, or just those which were altered in the Update run. The latter option is the normal mode. In this mode the Compile file contains exactly those routines which were changed. This makes it easy to synchronize source code and object code without unnecessary recompilation.

In addition to decks, a PL can also contain source code blocks called Comdecks, which are source code macros without arguments. The macros are invoked by Update commands embedded in the source text. These commands are called Comdeck calls. When a deck containing a Comdeck call is written to the Compile file Update substitutes the Comdeck text for the call statement. (Comdecks can call other comdecks; however this is not normally done.) Whenever a Comdeck is altered all decks calling that Comdeck are written to the Compile file.

Comdecks are particularly convenient in FORTRAN programs. A typical strategy is to put global data sets in labeled commons with a separate comdeck for each labelled common. This ensures that global data sets are always the same in all routines. Altering a Comdeck sends all decks calling it to the Compile file.. Hence all routines using an altered data set will be correctly recompiled. The integrity and consistency of global data sets is an important issue in large programs. This is particularly true for large FORTRAN programs; inconsistent labelled commons can produce rather mysterious execution errors.

CDC Upfiles contain two types of lines – source code lines to be inserted and Update commands. Update commands have a character in column one called the master character. The choice of master character is an option. It is selected when the PL is created but can be overridden in the Update run. The default master character is an asterisk.

Update commands fall into two classes. One class consists of utility commands used for merging PLs, reading the input stream from multiple files, etc. The other class consists of commands directly associated with the sequence of code changes. This sequence is divided into units called correction sets. Each correction set is preceded by an IDENT command line which specifies a name for the correction set. Correction sets can include changes to several decks. This differs from edit systems where each file must be processed separately.

The most common Update commands are the INSERT and DELETE commands. INSERT names a line at which code is to be inserted. It is followed by a stream of source code lines to be inserted. DELETE may have one or two line names. If there is one the named line is selected. If there are two the named lines and all lines between them are deleted. YANK has the effect of removing the correction set from the history of changes. UNYANK reverses the effect of a previous yank; it brings the correction set back to life.

Deleted lines and yanked correction sets are not removed from the PL. They remain in it but are inactive. (There are also commands to permanently remove lines, correction sets, and decks from a PL.) In principle we can always go back to a previous version of the program by yanking all corrections sets subsequent to the desired version. In practice this is rarely done in a single site operation but the capability is important; it means that the PL contains within it all prior versions of the program. Yanks are common in multisite operations because of the increased likelihood of conflicting correction sets.

Source code lines have a unique, invariant two-part name. The first part is the name of the deck or correction set that entered it into the PL. The second part is a sequence number giving its position among the lines inserted by the deck or correction set that entered it.

Update commands must reference lines in the PL being updated. These lines need not be active – we can insert code at deleted lines. If we do the new source is inserted where deleted line was. In general CDC Update does not allow conflicts between correction sets; it is possible for two different command lines to insert code at the same line or for two of them to delete the same line. Conflicts are resolved by processing the commands on a first come, first serve basis.. The order of correction sets (and commands within them) matters if and only if there are line reference conflicts. If there are no conflicts the results from an Update run are independent of the order in which the correction sets are applied.

Code never disappears from an Update PL. It can be surprisingly difficult for people who are used to Editors to grasp this. The point is that Editors destroy their input – the new version replaces the old version. (The replacement process may be buffered by back up versions.) This idea that change implies destruction of the past becomes an unquestioned axiom. The reason that Editors have to destroy old versions is that they have no place to store the changes. CDC Update does; an Update run adds the Upfile to the PL. In effect, a PL file contains the original source code plus all changes in incremental form. The actual structure of the PL is more complicated scheme; a PL is a reasonably sophisticated structure with a variety of directories.

This structure makes it easy to selectively remove old corrections sets. The value of this can be hard to appreciate if you have never experienced its usefulness. The reason for this may be that code change is an incoherent process when an Editor is sued; changes made with an edit session need not and typically do not have a logical relationship to each other. However a correction set can and should be a coherent set of logically related changes. Hence it makes sense to think of yanking a correction set.

To summarize: CDC Update has several features which make it a powerful software maintenance tool. The comdeck feature provides a simple, reliable way to maintain global data sets. The Compile file feature provides a convenient, reliable way to maintain consistency of source code and object code. The deck feature allows us to maintain a program as a single integrated structure. It also allows to make changes which extend over a number of routines. The correction set formula allows us to segregate changes into distinct logical units which be developed. The revision history gives us a public registry of changes. It also permits us to remove and restore correction sets at will.

Case Study: Program Management in System A

System A began life with a large, detailed, incomplete set of specifications and a newly recruited software team. It did not have a predefined set of procedures for program maintenance; the software team was free top develop their own. One of the early objectives was to decide how the software being developed was to be maintained. Using Update as the principal tools was inevitable since the system was developed and run on the CDC 6600. The system was maintained at two sites separated by 7000 miles. One site was the master site; the other was subordinate. The master site had final responsibility and authority for the system.

We decided to put System A in a single PL; this worked well for us because CDC Update works best when all code for a program is in one PL. However there are some disadvantages associated with using a single PL. One is the time required for Update runs. Even though Update is quit efficient the time required for an Update run necessarily increases with program size. Another disadvantage is that it is hard to maintain a collection of distinct programs in one PL.

We decided that there should be a single central version of the System. In practice there had to be multiple versions of the system because parallel development was going on. These parallel versions were all represented as differences from the central version, i.e., they were all maintained as Upfiles for the base PL. Not allowing independent permanent parallel versions meant that we could treat the program over time as linear sequence of versions.

We decided to use three levels of versions of the program. The primary level versions, which were called base versions, served as the main sequence of versions of the System. There was only one base version at a time. Each base version served as a reference for the next one. Once a base version was created it was fixed; no revisions were permitted. Each base version was supposed to be an operational version. The second level versions were intermediate versions between base versions. They contained the sequence of officially approved changes to the system. Ordinarily the only second level version present was the latest one. The third level versions were the development versions. These were transient. They existed while a change to the system was being developed. Second and third level version were created via Upfiles to the base PL.

The base version was a PL file. The current intermediate version was always represented by both a PL and an Upfile, the latter containing the correction sets that transformed the current BASE PL into the current intermediate version. Temporary versions were represented by Upfiles.

Program maintenance at each site was controlled by one person called the controller. She had more authority and responsibility than that usually given a Librarian.. Programmers gave new correction sets to her for inclusion in the central system. She did the following things:

She was the test director. She tested or supervised all tests of the System. She had sole authority to change the System and to create official version of the System. She was the only one who could change the central Upfile. She had responsibility for sending and receiving versions from the other site. She reconciled conflicting correction sets and ensured that correction sets were properly formatted.

The process of change ran as follows: There was a central register which listed all correction sets under development. The first step was to get a correction set name (there was a rigid naming convention), enter it into the register, and describe its purpose. The programmer then created, developed, and tested the change as a correction set. The programmer could test the change against the BASE PL or against the current version; the BASE PL was usually chosen because it was stable. When the programmer was satisfied that the change was correct her turned it over to the controller.

The controller checked to see if the correction set was properly documented. CDC Update provides for command lines which are comments. We required that all correction sets be internally documented in a set manner. The documentation included the programmer name and the correction set purpose. If more than one deck was affected by the correction set the changes were segregated by deck with comment lines describing the decks affected. Improperly documented sets were rejected out of hand.

New correction sets were checked for conflicts with the current Upfile. These conflicts might be technical, e.g., two correction sets might insert code at the same line, or they might be conflicts of purpose. The controller reconciled existing correction sets and the new correction set, freely altering both the existing correction sets and the correction sets being added. The object was to have a consistent, independent correction sets in the Upfile which reflected the intended changes. The reconciled Upfile was then rigorously tested and modified until the new version passed acceptance tests. From time to time the subordinate site would transmit correction sets to the master site; these were treated in the same way as changes initiated locally.

New base versions were created by the master site as follows: The correction sets in the official Upfile were split into two groups, those which were to be transmitted to the other site and those which were not. The Upfile was split into two files, one containing sets to be transmitted and the other the rest. The BASE PL was updated with the transmit Upfile and the updated program was carefully tested. The old BASE PL, the new BASE PL, and the transmit Upfile were sent to the subordinate site. The old BASE PL was replaced the new BASE PL, the old Upfile was replaced by the new BASE PL, the old Upfile was replaced by the residual Upfile, and the new BASE PL was updated with the new Upfile to produce a s new current version which was identical with the last current version which identical with the last current version. The subordinate site made a new BASE PL when it received a transmission from the master site. Their procedure was to install the new BASE PL and then alter their own PL to reconcile it with the new BASE PL, deleting all local correction sets which were now in the BASE PL.

Update made it easy to enforce source and object code consistency. Suppose we have an object code file that was created by compiling the source code in a PL. If we update the PL the compile file will contain only those routines which were altered by the update. We can compile those routines to get an incremental object code file. If we merge the incremental object code file and the original object code file, we get a file containing the object code for the updated PL.

In practice we recompiled the entire program whenever a new official version was made, whether it was a new base version or a new intermediate version. Incremental object code was automatically correct only if canned procedures were used correctly. We could not guarantee that they would never be made incorrectly. Since official versions were not made often, it was considered worthwhile to expend extra effort to ensure their correctness.

Global data sets were always placed in comdecks. This ensured that they were the same in all routines, both in the source code and in the object code. Comdecks were not allowed to reference each other because of a technical defect in CDC Update. Suppose that deck A calls comdeck B which in turn calls comdeck C. If comdeck C is altered, deck A does not get written to the Compile file because it does not comdeck C directly. This defect is regrettable; nested comdecks would provide a convenient mechanism for representing hierarchical data sets.

Case Study: Two Hybrid Systems

CDC Update may be admirable but this does not matter if you are not using a CDC system. In recent years I have been in a project which used on an IBM 370. Many people in our group were familiar with CDC Update and wanted something similar for the IBM system. Since CMS is a time shared conversational system, we decided to use the CMS editor changing source code and an Update program for maintaining it. In the course of the project we created two such systems.

In the first system we used the IBM [4] Update facility. We wrote a difference file generator which an old source and edited source as input files and produced an update correction set as an output file. We also established version conventions similar to those used in system A.

This system worked. We could develop changes in parallel and we could exercise a certain amount of control over source code changes. However it had certain defects which were unavoidable, given the use of IBM Update. The problem is that IBM Update operates directly on source files.

(a) We could not maintain fixed line names over time. (If the line number step between two lines is 1 there is no way to insert a new line between two lines.) Hence the source files had to be resequenced from time to time. We chose to resequence them each time a new base version was made. This meant there was no built in history of changes and no YANK facility.

(b) Each file had its own list of correction sets: There was no way to construct a single correction set which represented changes to a number of routines in different files.

(c) The system was clumsy. For example each correction set had to be in a separate file. For each source file there had to be a separate file which contained a list of the correction sets to be applied to the source file. The system was also fragile because it depended on rigid line numbering conventions.

The system did have two features that are worth considering for other systems. The first was that the difference file generator automatically created a revision history comment which went into the source file being updated. The second good idea was the works file. This was a separate file which contained a list of changes to be made to the program. Each entry documented a proposed correction set. An entry contained a description of the proposed change, the name of the proposer, the date proposed, the date that action on the proposal was initiated, the date of completion, and the name of the correction set. No changes were made to the program unless they were first entered in the works file.

There were several nice things about this scheme. Program changes were documented in advance. Proposals for program change sets were public and permanent. The works list was useful for setting priorities because it contained a convenient check list of things that had not been done yet.

Our experience with this system suggested that it would be worthwhile to write an Update system. We created a system which was similar to CDC Update. Source code was maintained in a PL. CDC Update conventions for naming lines were followed. Upfiles were segmented into correction sets. Unlike CDC Update, PL files contained correction set documentation, which consisted of the date, the author, and a one line description for each correction set. Our Update system did not support decks or comdecks – there was a separate PL for each source code file. There was also an Upfile generator. The inputs to this were an old PL and an edited source file. The output was the corresponding Upfile.

This system was very smooth in practice. We could edit a routine and then execute a single CMS command to generate the upfile and update the routine. We could recover prior versions and could yank old correction sets at will. Sequence fields contained line names after the fashion of CDC Update. (This is surprisingly useful; it allows us to determine, for each line, when and why it came into existence.)

The major defect of this system is that it is file oriented. We must maintain a multiplicity of files and we cannot extend a correction set across routines in different files. There were good reasons for not having a deck structure in a PL; maintaining multiple routines in a single PL is a tricky and potentially expensive proposition. In retrospect, whoever, the price of more sophisticated file management ought to have been paid. It would have been good if the automatic commenting of the first system had been included. It would have also been nice to have included the works file concept.

These systems are interesting because they try to combine the two major approaches to source code maintenance. There are a number of things that we can learn from experience with these systems. The foremost is the enormous importance of smooth operating procedures. The golden rule for software maintenance support is that it should place no burden on the programmer and should offer him no way to go wrong. We may not be able to do this in practice but it is surprising how much trouble can be created by seemingly minor glitches in operating procedures.

There are intrinsic problems with the notion of converting edit sessions into update correction sets. The root of the matter is that an edit session is a temporal event whereas a good correction set is a conceptual unit. A good correction set should be a collection of source code changes which implement a stated change in the program function; it should be coherent and well defined. Edit sessions tend not to have this character of logical coherence; during a single edit session we may make a number of changes reflecting a number of different things to be done. It is not yet clear how this problem should be resolved.

There is also the problem of determining when a correction set should be made. When we make changes with an Editor we often go through several passes before we are satisfied that the change is made correctly. The danger is that we will make a correction set prematurely. What we need is a facility for reworking a correction set.

Similar Ideas in Editor Oriented Systems

We have illustrated the use of Update systems for handling many of the recurring tasks in software maintenance. There have been efforts to develop similar facilities in Editor oriented systems. Let us look at some of these ideas.

UNIX has a difference file scheme. [1] Source code files are stored in a central repository; each source file is represented by an original version and a sequence of difference files. When a user wants to develop a new version of a source file, he gets the latest version from the central repository. He then has possession of the file, i.e., while he is working on it no one else can “get” it. When he has completed work on the file he returns it to the repository. The repository management software generates the difference file that represents the changes he made and updates the representation of the source file. The user can also “get” a prior version of a file instead of the latest version. Horn’s STEP program [2] is similar.

The scheme used by UNIX is actually a hybrid Update system. It has a number of weaknesses and some strengths. The idea of a central repository is good; like the CDC Update PL, it eliminates the source code file management problem. The correction set mechanism is primitive. There is no provision for selectively yanking correction sets. Line names do not reflect their origin; they are simply resequenced line numbers. (STEP provides annotated listings giving line origins and authors.) Correction sets do not extend across source files. It is hard to develop changes in parallel. The scheme does not, in its own right, include a mechanism for coordinating source code and object code nor does it include an integrated source code macro system.

The basic scheme can and has been extended by adding additional facilities. For example, Fraser’s MERGE program [5] merges two difference files. The idea is that we can develop changes in parallel, generate their respective difference files, and then merge them to get a composite difference file. The disadvantage of this scheme is that the parallel changes lose their identity in the composite. In Update systems such as CDC Update, this merge function is built into the Update processor.

Object code synchronization has also received attention. Feldman’s MAKE program [6] is an interesting idea. The idea here is to create a file of dependencies. Element A depends on element B if a change in B requires an action on A. For example, the object code for a routine depends on the source code. The inputs to the MAKE program are dependencies file and the list of changed elements. The output is a file of commands that must be executed to incorporate the changes into the system.

The major weak point of this scheme is that the dependency file must be created and maintained manually. For example, suppose that we change routine A to include data set B. The dependency file must be changed to reflect this; i.e., the dependency file must manually track the system. Furthermore, we have no really reliable way of knowing that the dependency file is right.

DECsystem10 and DECsystem20 use a method which is fundamentally better. In this system the loader compares the dates of the object code modules and source code modules. If the object code is older, the source code is automatically recompiled. This scheme ensures that the object code and used in execution matches the current source code; the enforcement of this match is automatic and certain.

Using the date to enforce synchronization is not entirely sound because it assumes that the latest version is the only one of interest. We can remove this defect with the following scheme: Let us suppose that we are dealing with multiple versions of a program. Each version is defined by the correction sets included in it. Suppose that every version of the program has a name. Tag each object code module with the corresponding source code version name. It is then easy to determine whether a given object code module matches the corresponding source code module. We can apply the same scheme of automatically recompiling all source code modules that need to be recompiled. Instead of going through a chain of operations with a chance of error at each step in the chain we could simply say RUN VERSION X OF PROGRAM P and let the system take care of making sure that everything was right.

Language Dependence

General purpose software maintenance tools are generally language independent. Originally I accepted this an obvious and natural requirement. However I have come to believe that a good software maintenance system should have language dependent features. In particular there are three areas where this is so – source code formatting, source code listing, and global data set management.

Source code format is not a critical issue in FORTRAN programs, which are usually written in a flat, unindented style. In block structured languages, however, it is customary to indent the code. Every time we alter the block structure we alter the indentation of the lines within the affected blocks. This creates a problem for systems which track source code changes because they will treat a change in indentation level of a line as source code change.

In my opinion block structured languages should be handled as follows: The software maintenance system should include source code formatting modules for each language supported by the system. Source code is stored in the central repository in unindented form. When a user gets a routine from the repository it is automatically indented. When it is returned the indentation is stripped off and differences are generated from the unindented versions. As a matter of practical convenience the Editor should have a format command which correctly format’s the current work file.

I contend that source code listings should be produced by the software maintenance system. The listing format might run as follows: The page heading includes, among other things, the version name. The source code is preceded by a revision history that gives, for each correction set applied to the routine, the correction set name, its author and date, and a brief description of its purpose. The source code could also be preceded by a list of all global data sets accessed by the routine. The source code is correctly indented, using the appropriate formatting module. Each source line is annotated with the author, date, and name of the correction set that originated the line.

The third area where the choice of source language is critical is handling global data sets. It seems to me that the only viable way to deal with this problems is to use a preprocessor. The source code, as seen by the user, contains preprocessor statements which signal access to global data sets. When compiler input files are generated the preprocessor does whatever is necessary to add the data set declarations to the source code. For example it would generate a compool directive in a JOVIAL program. In an ANSI standard FORTRAN routine it would put the declaration statements in the correct places. The details vary from language to language. The important point is that each language requires a separate preprocessor module for compiler input file generation.

Further Observations

We have looked at CDC Update, hybrid Editor/Update system, and Editor based systems which have Update-like features . Our survey leads to the concept of a software maintenance and development system with the following general features:
a. Source code and object code for a program or an entire project is stored in a central repository. This repository can be thought of as a specialized database. The repository also contains additional information used for documenting and controlling the project software.
b. Source code, object code, execution modules, and global data sets are automatically synchronized. Global data set synchronization is handled by preprocessors. Object code and execution modules are synchronized by matching version tags.
c. Source code changes are maintained with an Update system. Changes are divided into correction sets which are the basic unit of change. A single correction set may include changes to a number of routines.
d. The system maintains a hierarchy of versions. Each version is names and is defined by the correction sets included in it. Version integrity is automatically ensured.
e. Changes to source code are made with an Editor. Changes made in edit sessions are automatically converted into difference files. There are facilities for converting raw difference files into “good” correction sets. Documentation of changes is concurrent and is enforced and maintained by the system.
f. The system incorporates modules for formatting source code and source listings. It also includes numerous facilities for producing project software documentation.
g. In general, all procedures are automatic and do not require intervention by the users. At no time do users have to manage source or object files.

This general picture is fairly clear cut. The actual specification and construction of such a system is a formidable undertaking, of course. We have looked at some of the main ideas. Experience with existing systems suggests some further useful thoughts.

(1) Recent history is important; ancient history has little value. The ability to recall and examine recent changes is incredibly useful. Quite often we say things like, “Subroutine X used to work and doesn’t anymore. What changes were made to it recently?” The ability to recover previous versions is also important. Sometimes we say things like, “The current version doesn’t work right. We need the output. Let’s bring up an older version that we have confidence in.” In practice there are limits on how far it is profitable to go back. A knowledge of changes that were made some time ago has little value for debugging In time, old version lose their value because system enhancements become incorporated into operation procedures..

(2) In existing hybrid systems the customary procedure is to get a file from the repository, edit it, and then return it to the repository. While a file is being altered it must temporarily exist as a file on the users account. Temporary files make for file management problems. This need for temporary files could be eliminated if the Editor got the source code directly from the repository and returned it directly to the repository.

(3) The controller played a crucial role in the development of system A. In my opinion this control function should be supported by the system. Correction sets and versions should go through an approval cycle.

(4) The system should be able to handle multiple main programs in the central repository. This requirement raises a number of problems which are beyond the scope of this paper.

(5) The central repository should not be a single file which must be rewritten every time the repository is changed. (This is one of the serious weakpoints of the CDC Update system.) The best choice is to operate in the same way that a DBMS does; the software manager would have control over a data space and would selectively rewrite sections of it. A less attractive, but feasible option is for the central repository to be a collection of files which are under the sole control of the software manager.

Conclusions

We have pictured a rather comprehensive and powerful software maintenance system based upon experience with a variety of Update systems. We have drawn upon a host of ideas from observation of Update system such as the idea of a central repository. In addition to specific ideas there is also a general lesson to be learned. If we are going to develop new ideas and software technologies we should first look at the old ones. Furthermore we should at the experience that users have had with these tools and technologies.

Acknowledgements

I am grateful to Michelle Dalpe for her efforts as typist and to Mary Cole and Thomas Einstein for many valuable and illuminating comments.

References

1. M.J. Rochkind, The Source Code Control System, IEEE trans. Software Eng., Vol SE-1, No. 4, Dec 1975, pp. 364-370.

2. D. Knight and E. Van Horn, The STEP Programming Library, presented at the 1979 Fall DECUS Conference.

3. Control Data Corp., Update Reference Manual, No. 60342500, 1978.

4. IBM Corp., IBM Virtual Machine Facility, CM Command and Macro Reference, 1979, GC20-1818-2.

5. C.W. Fraser, Maintaining Program Variants by Merging Editor Scripts, Software Practice and Experience, 10 (1980) pp. 817-821.

6. S. Feldman, MAKE – A Program for Maintaining Computer Programs, Software Practice and Experience, Vol 9 (1980) pp. 255-265.

7. Digital Equipment Corp., DECSystem10 Operating Command Manual, 1974, DEC-10-OSCMA-A-D.


This page was last updated November 14, 2001.

home
table of contents
Math & Computer Science
November 2001
email