Content Change Management:
What it is and Why it is Needed
by Richard Harter
Introduction
Content areas
Services offered by the ccm
The change set concept
Provenances
Data elements under management
Mechanics of change detection
Major functional capabilities
Conclusion
Introduction
E-commerce, and especially B2B e-commerce, sites typically expose large
quantities of information to the customer, who interacts with the public
interface in rapid ongoing transactions. Web site content changes
dynamically, both in response to customer interaction and alteration in
content by the site operator.
Just as software change management systems do with source code, website
Content Change Management [CCM] systems provide high level audit,
approval, distribution, verification, discovery, reporting, enabling, and
disabling of site changes in a granular and reliable fashion. This
includes modifications made to complex software applications used in
conjunction with e-commerce sites, as well as modifications to the dynamic
and static content that drives those sites.
CCM systems provide auditing, reporting, verification, deployment, and
rollback of modifications made to complex software systems, including
e-commerce Web sites. In contrast, Content Management systems are used to
ensure that site content is coherent: for example, that styles are
consistent, links are maintained properly, and that tools and scripts mesh
properly. Proper use of Content Management systems can ensure that site
content and format follows consistent standards. However, they do not in
their own right manage the change process and deal with the consequences
of dynamic change.
Content Change Management is a separate but related function, one that can
and should be coordinated with the Content Management system.
Content areas
In addition to the content visible to customers, an e-commerce web site
will have a number of auxiliary content areas in which development and
testing is done. These auxiliary content areas include workspaces on the
developer’s desktop, test beds for quality assurance testing, and
“sandboxes”, in which developers try out new ideas before they are
deployed to the production site. These auxiliary areas may be complete
copies of the active Web site content area or subsets of the production
site, and they may include data elements not present in the principal
content area. The Content Change Management system must monitor change in
all of these content areas.
Services offered by the ccm
A CCM provides the following services to the developers and maintainers of
Web sites:
- Site content verification: the ability to verify that the content of
the production Web site visible to customers and any of the auxiliary
content areas (the sandbox’s and backup areas) is correct, i.e., is the
content that the owner of the web site intended for display. This is a
critical capability. Content areas are subject to degradation by hostile
attack, by accidental deletion and replacement of files, by hardware
accidents, and by damage due to faulty processes or incorrect programming.
Verification must be dynamic, that is, it must be possible to determine
that a content area is correct in real time rather than determining merely
that a baseline is correct.
- Change detection: the ability to correctly and comprehensively
identify all changes to the web site, whether to content, template,
formats, or code.
- Rollback: the ability to return to the state of the web site at an
earlier date and time.
- Change migration: allows a change or set of changes made in one
content area to be moved to another content area. For example, a set of
changes in a developer’s workspace will eventually need to be moved to a
quality assurance testing site and, from there, to the production web
site. In many CCM systems (and many configuration management systems
imported from the software development environment) change migration is
achieved by copying whole files and databases. This is a fundamentally
unsatisfactory approach, which fails to allow the appropriate granularity
of change tracking. In good CCM systems, data elements, including files,
are altered to reflect the specific alterations in the change.
- Promotion and demotion: Promotion is the process of migrating a change
from one content arena to a higher ranked area typically from a QA test
site to the production system. Demotion is the process of removing a
change from a specified content area or areas.
- Incremental deployment: the process of filling a content area with the
appropriate content.
- Reporting and analysis: provides a fine level granularity of
information about what changes have been made, by whom, at what time, and
where those changes have been applied.
- Platform transparency: the ability to operate on data elements
regardless of the hardware and software platform on which they reside.
UNIX and Windows NT, to take one example of different platforms, have
minor differences in file formats, e.g., path names, line terminators, and
file terminators that must be accommodated by the CCM.
- Version management: the process of accurately specifying and producing
selected versions of the content under management. This would allow
previous revisions of the software to be re-created.
The change set concept
Consider the process of change at a website. Change happens as a sequence
of transactions. The transactions may be the result of modifications by
developers intended to improve the services and functionality of the site,
they may be content change by content contributors such as marketing
managers, or they may be changes to the look and feel of the site by user
interface (UI) designers.
A Content Change Management system must be able to record and reproduce
both the change and the process of change. How can it do this? One way
to do this, often used with databases, is to take a snapshot on a regular
basis and then keep a log of transactions that occur between snapshots.
There are obvious shortcomings of this approach in the amount of disk
space consumed and in the time it takes to re-create a particular state of
the database by “replaying” the transaction log against the baseline.
Most importantly, however, this procedure does not have a paradigm for
migrating changes from one content area to another – something essential
for distributed and parallel development.
A superior approach is provided by the change set concept, in which all
changes associated with a specific purpose (for example corporate
boilerplate in a set of web pages or templates) are captured in “change
sets,” each of which is managed independently. For example, you might
need to fix a JavaScript bug and make a change to the template for the
same web page. In that case, you would create a change set for the bug
fix and another for the template modification. You could deploy either or
both change-sets, and it wouldn’t matter which one you deployed first. If
both are deployed, and a problem is found with either one, the problem
change can be selectively rolled back, leaving the other change in place.
Every state (version) of the content under management can be represented
by a consistent composite change set. Transitions from one state
(version) to another can be represented by a change set containing the
previous version and the changes that carry the first state into the
second.
Change sets can also operate on versions in order to promote or demote the
status of other change sets. Thus, we can say “Add change sets 4, 5, and
6 to version V1”. In other words, version specifications are part of the
content.
Why is the change set concept powerful? Because change sets can be
applied and removed independently of one another. Change sets containing
bug fixes can be applied first to a test version of the web site, and
then, in an auditable manner, to one or multiple production versions of
the web site. To go from one state to another we need only apply the
change sets that differ between the two states and we only need to alter
the elements that are actually affected. In other words, change sets
automatically provide incremental updating and rollback capabilities with
the ACID properties:
- Atomic transactions (all or nothing)
- Consistent results
- Isolated (independent of other transactions)
- Durable (effect is permanent)
of a database transaction.
The essence of change must be immutable, safeguarding changes and allowing
the user to recover changes at any time. When changes are manipulated,
change sets ensure that the whole change, and nothing but the change, is
treated in its entirety. Although no commercial product is available with
these capabilities today, change sets are in fairly wide use, and it can
be anticipated that they will find their way into the CCM tool kits of
developers.
Change sets are powerful because changes can be migrated from one version
to another just by activating the change set representing the change.
Finally change sets are powerful because they automatically capture all of
the information necessary for audit and reporting purposes, allowing you
to know and manipulate all details of the “5 W’s” of change, namely Who,
What, When, Where, and Why.
Provenances
A key requirement for Content Change Management is that every change has a
provenance. A provenance is information associated with an object that
identifies the object’s origin and history. The provenance for a change
has five major pieces of information, commonly known as the five W’s: who,
when, what, where, and why.
Who: The “who” is the actual person or agency that made the change.
Detection of the “who” can be done automatically; file system managers
usually record this information. Knowing who made a change is often the
key to establishing why a change was made. It is also important for
establishing that a change was actually an authorized change.
When: A CCM should identify when a change was created and when it was
applied.
Where: The “where” identifies the places where the change has been
applied. For example, has the change been applied to both the QA-tested
and all product sites, or has it been applied only to the test bed? It
also lets you know the origins of the change.
What: The “what” is the actual substance of the change, e.g., a changed
name, the changed value of an attribute, inserted and deleted lines of
text in a file, et cetera. The “what” is also automatically available as
part of the change detection process.
Why: The “why” is the reason for the change. This is the most problematic
component of the provenance because, in general, it cannot be supplied
automatically. Sometimes the “why” can be supplied by indirection, i.e.,
the change is part of a larger change that does have a defined purpose.
Quite often, however, the CCM system must interact with the person
recording the change to request a description of why the change was made.
With change sets, each of the 5 Ws is captured for all of the data
alterations made to achieve a logical change. This generates a
consistent, complete audit trail of every change at every point in the
system.
Data elements under management
The major data types for which changes must be captured are:
- Identifiers: Identifiers (names) are bound to elements. When these
change, for example, when a file is re-named, this change must be
captured.
- Attribute/value pairs. Change here is a matter of replacement of one
value by another. Attribute/value pairs may be stored in a variety of
ways; the critical factor for CCM is that the pair is identified and that
the replacement be recorded.
- Changes to flat files. Flat files, such as source code files, consist
of a sequence of records (typically lines of ASCII text). Changes to flat
files consist of inserts, deletes, and modifications of existing lines.
- Changes to structured files. Structured files are very common;
typical examples include directory trees, structured HTML and XML, and
many word processor files. Changes to structured files have two
components, changes to the structure itself, and changes to the content
within nodes. The marker tags that delimit the structure may either be
drawn from a fixed dictionary or may be variable. The algorithms used for
flat file change detection can be used for structured files but the
results are not completely satisfactory because the structure boundaries
are not identified.
- Database content. A large part of an e-commerce application and/or
content is managed in relational databases. Some of this data benefits
from CCM, and must be represented as part of the larger activity of a
logical change.
Mechanics of change detection
The basic operations on content elements are creation, deletion, and
alteration of content. (Binding of identifiers is included under creation
and alteration.)
There are two basic methods that are used in change management systems for
determining when and how change detection occurs. They are
checkout/checkin and update.
Checkout/checkin is an explicit process in which a person making a change
“checks out” elements, typically files, changes them, and checks them back
in. A checked out element also serves as notice to co-workers that a
change is in progress to the element in question. Change detection is
done when the elements are checked back in. Comparison is done between
the content of the elements as they were when they were checked out and
the elements’ contents as they became when they are checked in.
The checkout/checkin method is commonly used in source code control
systems. However it is not satisfactory for e-commerce web sites. It
imposes a cumbersome process upon the people making changes This process
may be acceptable, even desirable, for professional programmers, but isn’t
always viable in the context of a heterogeneous mixture of web content
developers and the attendant tools of the trade that they utilize. It is
difficult to capture all content changes using the checkout/checkin
method.
In the update method, change detection is done when a checkpoint (or
milestone or snapshot) is reached. Determination of when a checkpoint is
reached may be automatic, manual, or a combination of both automatic and
manual determination. Comparison is done between the current contents in
a selected content area and a reference version of the content area.
Typically the reference version is the state of the selected content area
as it was at the time of the last checkpoint.
Major functional capabilities
A good Content Change Management system has several major functional
capabilities critical to the successful deployment of a web site. These
include change migration, distributed authoring, parallel development,
auditing, reporting, and a selective rollback capability.
Change migration refers to the movement of changes made in one content
area to another and the movement of changes included in one version to
another version. Without a CCM system it is usually difficult and
sometimes impossible to determine the exact scope of a change and to make
all alterations and only those alterations in some other scope or at
another time.
Any large site will have many authors contributing to the content.
Authors and application developers often need to work collaboratively even
though they work in geographically and organizationally distinct sites. A
good CCM of necessity is able to accept input from a variety of
distributed sources and correctly and transparently determine and apply
changes from that source.
Parallel development often goes hand in hand with distributed authoring.
An e-commerce Web site is not static; the services and features that it
offers are constantly evolving. Development of these new services and
features necessarily takes place off-line and is usually done in the
context of a defined baseline that rapidly become out of date. To support
this the CCM system must provide both the ability to create branch
versions and to merge the features developed in the branch versions back
into the main line.
An example of the necessity of parallel development and distributed
authoring capabilities is the handoff of an e-commerce web site from a
third party site developer to the customer, followed by outsourced (third
party) site enhancements made to the deployed site at the customer’s
instruction, in tandem with the development of the next major site release
by the original site developer. CCM helps to ensure that changes
originating from all parties are accounted for in successive releases of
the e-commerce site.
Auditing and reporting is an essential feature that a CCM system provides.
Without audit trails, the cost of determining and repairing faults in an
e-commerce Web site can be enormous. For many applications of CCM such as
pharmaceutical content publishing, audit and approval capabilities are
essential to minimize legal liabilities. In an increasingly outsourced
economy, audit capabilities play a crucial role in minimizing the finger
pointing and breach-of-contract suits that often arise from a
misunderstanding of who made what changes, and when.
Checkpoint and rollback facilities make it possible to periodically
capture the state of the web site and then return to that state. It is
also possible to selectively rollback (remove) specific change sets. Thus
if the site has undergone many modifications and then it is discovered
that one of those modifications is faulty, the web site manager has a
choice of rolling back the entire web site to a time before the faulty
change was made or simply removing the problem change.
In some cases selective roll back is essential, such as when deployed
changes include schema modifications to backend databases, and it would be
problematic to roll back the web site after the database has processed
transactions in the format of a new schema. The ability to discover
change activities and selectively roll back or roll forward (aided by
change reporting) can mean the difference between minutes and days in
restoring correct behavior to content and applications.
Conclusion
Content Change Management is essential in the modern online e-commerce
environment. Without it, many necessary operations become awkward or even
impossible. The versioning capabilities found in standalone systems such
as CVS or in today’s Content Management solutions are only a partial
answer. A full-fledged Content Change Management system is needed for the
successful management of critical e-commerce operations.
This page was last updated January 1, 2004.