home
table of contents
Comp Sci
January 2004
email

Content Change Management:
What it is and Why it is Needed


by Richard Harter
Introduction
Content areas
Services offered by the ccm
The change set concept
Provenances
Data elements under management
Mechanics of change detection
Major functional capabilities
Conclusion

Introduction

E-commerce, and especially B2B e-commerce, sites typically expose large quantities of information to the customer, who interacts with the public interface in rapid ongoing transactions. Web site content changes dynamically, both in response to customer interaction and alteration in content by the site operator.

Just as software change management systems do with source code, website Content Change Management [CCM] systems provide high level audit, approval, distribution, verification, discovery, reporting, enabling, and disabling of site changes in a granular and reliable fashion. This includes modifications made to complex software applications used in conjunction with e-commerce sites, as well as modifications to the dynamic and static content that drives those sites.

CCM systems provide auditing, reporting, verification, deployment, and rollback of modifications made to complex software systems, including e-commerce Web sites. In contrast, Content Management systems are used to ensure that site content is coherent: for example, that styles are consistent, links are maintained properly, and that tools and scripts mesh properly. Proper use of Content Management systems can ensure that site content and format follows consistent standards. However, they do not in their own right manage the change process and deal with the consequences of dynamic change.

Content Change Management is a separate but related function, one that can and should be coordinated with the Content Management system.

Content areas

In addition to the content visible to customers, an e-commerce web site will have a number of auxiliary content areas in which development and testing is done. These auxiliary content areas include workspaces on the developer's desktop, test beds for quality assurance testing, and "sandboxes", in which developers try out new ideas before they are deployed to the production site. These auxiliary areas may be complete copies of the active Web site content area or subsets of the production site, and they may include data elements not present in the principal content area. The Content Change Management system must monitor change in all of these content areas.

Services offered by the ccm

A CCM provides the following services to the developers and maintainers of Web sites:

  • Site content verification: the ability to verify that the content of the production Web site visible to customers and any of the auxiliary content areas (the sandbox's and backup areas) is correct, i.e., is the content that the owner of the web site intended for display. This is a critical capability. Content areas are subject to degradation by hostile attack, by accidental deletion and replacement of files, by hardware accidents, and by damage due to faulty processes or incorrect programming. Verification must be dynamic, that is, it must be possible to determine that a content area is correct in real time rather than determining merely that a baseline is correct.
  • Change detection: the ability to correctly and comprehensively identify all changes to the web site, whether to content, template, formats, or code.
  • Rollback: the ability to return to the state of the web site at an earlier date and time.
  • Change migration: allows a change or set of changes made in one content area to be moved to another content area. For example, a set of changes in a developer's workspace will eventually need to be moved to a quality assurance testing site and, from there, to the production web site. In many CCM systems (and many configuration management systems imported from the software development environment) change migration is achieved by copying whole files and databases. This is a fundamentally unsatisfactory approach, which fails to allow the appropriate granularity of change tracking. In good CCM systems, data elements, including files, are altered to reflect the specific alterations in the change.
  • Promotion and demotion: Promotion is the process of migrating a change from one content arena to a higher ranked area typically from a QA test site to the production system. Demotion is the process of removing a change from a specified content area or areas.
  • Incremental deployment: the process of filling a content area with the appropriate content.
  • Reporting and analysis: provides a fine level granularity of information about what changes have been made, by whom, at what time, and where those changes have been applied.
  • Platform transparency: the ability to operate on data elements regardless of the hardware and software platform on which they reside. UNIX and Windows NT, to take one example of different platforms, have minor differences in file formats, e.g., path names, line terminators, and file terminators that must be accommodated by the CCM.
  • Version management: the process of accurately specifying and producing selected versions of the content under management. This would allow previous revisions of the software to be re-created.

The change set concept

Consider the process of change at a website. Change happens as a sequence of transactions. The transactions may be the result of modifications by developers intended to improve the services and functionality of the site, they may be content change by content contributors such as marketing managers, or they may be changes to the look and feel of the site by user interface (UI) designers.

A Content Change Management system must be able to record and reproduce both the change and the process of change. How can it do this? One way to do this, often used with databases, is to take a snapshot on a regular basis and then keep a log of transactions that occur between snapshots. There are obvious shortcomings of this approach in the amount of disk space consumed and in the time it takes to re-create a particular state of the database by "replaying" the transaction log against the baseline. Most importantly, however, this procedure does not have a paradigm for migrating changes from one content area to another - something essential for distributed and parallel development.

A superior approach is provided by the change set concept, in which all changes associated with a specific purpose (for example corporate boilerplate in a set of web pages or templates) are captured in "change sets," each of which is managed independently. For example, you might need to fix a JavaScript bug and make a change to the template for the same web page. In that case, you would create a change set for the bug fix and another for the template modification. You could deploy either or both change-sets, and it wouldn't matter which one you deployed first. If both are deployed, and a problem is found with either one, the problem change can be selectively rolled back, leaving the other change in place.

Every state (version) of the content under management can be represented by a consistent composite change set. Transitions from one state (version) to another can be represented by a change set containing the previous version and the changes that carry the first state into the second.

Change sets can also operate on versions in order to promote or demote the status of other change sets. Thus, we can say "Add change sets 4, 5, and 6 to version V1". In other words, version specifications are part of the content.

Why is the change set concept powerful? Because change sets can be applied and removed independently of one another. Change sets containing bug fixes can be applied first to a test version of the web site, and then, in an auditable manner, to one or multiple production versions of the web site. To go from one state to another we need only apply the change sets that differ between the two states and we only need to alter the elements that are actually affected. In other words, change sets automatically provide incremental updating and rollback capabilities with the ACID properties:

  • Atomic transactions (all or nothing)
  • Consistent results
  • Isolated (independent of other transactions)
  • Durable (effect is permanent)
of a database transaction.

The essence of change must be immutable, safeguarding changes and allowing the user to recover changes at any time. When changes are manipulated, change sets ensure that the whole change, and nothing but the change, is treated in its entirety. Although no commercial product is available with these capabilities today, change sets are in fairly wide use, and it can be anticipated that they will find their way into the CCM tool kits of developers.

Change sets are powerful because changes can be migrated from one version to another just by activating the change set representing the change. Finally change sets are powerful because they automatically capture all of the information necessary for audit and reporting purposes, allowing you to know and manipulate all details of the "5 W's" of change, namely Who, What, When, Where, and Why.

Provenances

A key requirement for Content Change Management is that every change has a provenance. A provenance is information associated with an object that identifies the object's origin and history. The provenance for a change has five major pieces of information, commonly known as the five W's: who, when, what, where, and why.

Who: The "who" is the actual person or agency that made the change. Detection of the "who" can be done automatically; file system managers usually record this information. Knowing who made a change is often the key to establishing why a change was made. It is also important for establishing that a change was actually an authorized change.

When: A CCM should identify when a change was created and when it was applied.

Where: The "where" identifies the places where the change has been applied. For example, has the change been applied to both the QA-tested and all product sites, or has it been applied only to the test bed? It also lets you know the origins of the change.

What: The "what" is the actual substance of the change, e.g., a changed name, the changed value of an attribute, inserted and deleted lines of text in a file, et cetera. The "what" is also automatically available as part of the change detection process.

Why: The "why" is the reason for the change. This is the most problematic component of the provenance because, in general, it cannot be supplied automatically. Sometimes the "why" can be supplied by indirection, i.e., the change is part of a larger change that does have a defined purpose. Quite often, however, the CCM system must interact with the person recording the change to request a description of why the change was made.

With change sets, each of the 5 Ws is captured for all of the data alterations made to achieve a logical change. This generates a consistent, complete audit trail of every change at every point in the system.

Data elements under management

The major data types for which changes must be captured are:
  • Identifiers: Identifiers (names) are bound to elements. When these change, for example, when a file is re-named, this change must be captured.
  • Attribute/value pairs. Change here is a matter of replacement of one value by another. Attribute/value pairs may be stored in a variety of ways; the critical factor for CCM is that the pair is identified and that the replacement be recorded.
  • Changes to flat files. Flat files, such as source code files, consist of a sequence of records (typically lines of ASCII text). Changes to flat files consist of inserts, deletes, and modifications of existing lines.
  • Changes to structured files. Structured files are very common; typical examples include directory trees, structured HTML and XML, and many word processor files. Changes to structured files have two components, changes to the structure itself, and changes to the content within nodes. The marker tags that delimit the structure may either be drawn from a fixed dictionary or may be variable. The algorithms used for flat file change detection can be used for structured files but the results are not completely satisfactory because the structure boundaries are not identified.
  • Database content. A large part of an e-commerce application and/or content is managed in relational databases. Some of this data benefits from CCM, and must be represented as part of the larger activity of a logical change.

Mechanics of change detection

The basic operations on content elements are creation, deletion, and alteration of content. (Binding of identifiers is included under creation and alteration.)

There are two basic methods that are used in change management systems for determining when and how change detection occurs. They are checkout/checkin and update.

Checkout/checkin is an explicit process in which a person making a change "checks out" elements, typically files, changes them, and checks them back in. A checked out element also serves as notice to co-workers that a change is in progress to the element in question. Change detection is done when the elements are checked back in. Comparison is done between the content of the elements as they were when they were checked out and the elements' contents as they became when they are checked in.

The checkout/checkin method is commonly used in source code control systems. However it is not satisfactory for e-commerce web sites. It imposes a cumbersome process upon the people making changes This process may be acceptable, even desirable, for professional programmers, but isn't always viable in the context of a heterogeneous mixture of web content developers and the attendant tools of the trade that they utilize. It is difficult to capture all content changes using the checkout/checkin method.

In the update method, change detection is done when a checkpoint (or milestone or snapshot) is reached. Determination of when a checkpoint is reached may be automatic, manual, or a combination of both automatic and manual determination. Comparison is done between the current contents in a selected content area and a reference version of the content area. Typically the reference version is the state of the selected content area as it was at the time of the last checkpoint.

Major functional capabilities

A good Content Change Management system has several major functional capabilities critical to the successful deployment of a web site. These include change migration, distributed authoring, parallel development, auditing, reporting, and a selective rollback capability.

Change migration refers to the movement of changes made in one content area to another and the movement of changes included in one version to another version. Without a CCM system it is usually difficult and sometimes impossible to determine the exact scope of a change and to make all alterations and only those alterations in some other scope or at another time.

Any large site will have many authors contributing to the content. Authors and application developers often need to work collaboratively even though they work in geographically and organizationally distinct sites. A good CCM of necessity is able to accept input from a variety of distributed sources and correctly and transparently determine and apply changes from that source.

Parallel development often goes hand in hand with distributed authoring. An e-commerce Web site is not static; the services and features that it offers are constantly evolving. Development of these new services and features necessarily takes place off-line and is usually done in the context of a defined baseline that rapidly become out of date. To support this the CCM system must provide both the ability to create branch versions and to merge the features developed in the branch versions back into the main line.

An example of the necessity of parallel development and distributed authoring capabilities is the handoff of an e-commerce web site from a third party site developer to the customer, followed by outsourced (third party) site enhancements made to the deployed site at the customer's instruction, in tandem with the development of the next major site release by the original site developer. CCM helps to ensure that changes originating from all parties are accounted for in successive releases of the e-commerce site.

Auditing and reporting is an essential feature that a CCM system provides. Without audit trails, the cost of determining and repairing faults in an e-commerce Web site can be enormous. For many applications of CCM such as pharmaceutical content publishing, audit and approval capabilities are essential to minimize legal liabilities. In an increasingly outsourced economy, audit capabilities play a crucial role in minimizing the finger pointing and breach-of-contract suits that often arise from a misunderstanding of who made what changes, and when.

Checkpoint and rollback facilities make it possible to periodically capture the state of the web site and then return to that state. It is also possible to selectively rollback (remove) specific change sets. Thus if the site has undergone many modifications and then it is discovered that one of those modifications is faulty, the web site manager has a choice of rolling back the entire web site to a time before the faulty change was made or simply removing the problem change.

In some cases selective roll back is essential, such as when deployed changes include schema modifications to backend databases, and it would be problematic to roll back the web site after the database has processed transactions in the format of a new schema. The ability to discover change activities and selectively roll back or roll forward (aided by change reporting) can mean the difference between minutes and days in restoring correct behavior to content and applications.

Conclusion

Content Change Management is essential in the modern online e-commerce environment. Without it, many necessary operations become awkward or even impossible. The versioning capabilities found in standalone systems such as CVS or in today's Content Management solutions are only a partial answer. A full-fledged Content Change Management system is needed for the successful management of critical e-commerce operations.


This page was last updated January 1, 2004.

home
table of contents
Comp Sci
January 2004
email