Creating robust handlesA while back I posted an article entitled “A question of programming technique in C” about managing references to something I called bobbles. This lead to a lively discussion. Below is a rewrite that may be clearer. I particularly want to thank Eric Sosman, Ben Bacarisse, and Tim Rentsch for many useful and constructive comments. Suppose we are writing a program in C and that we have a particular kind of object that I will call bobbles. These bobbles are created and destroyed during the course of the execution of the program. In particular bobbles can be destroyed even though references to them still exist. This pattern occurs in dynamic data flow programming, in biological simulations with changing populations of individuals that interact, and in interactive networks where network elements and connections are continually changing. Being good software citizens we want to encapsulate the bobble related code. In C we can do this by creating a module consisting of one or more .c files and an include file, bobble.h. The module contains public functions that users call, and private functions containing encapsulated code. The include file contains prototypes for the public functions. In particular it will include functions to create and delete bobbles. The include file also defines references or handles to bobbles. The issue at hand is how to define references and how to use them to access bobbles. An important consideration here is that a reference can become stale; i.e., it “points” to a bobble that no longer exists. The code must be able to detect that a reference has gone stale and respond appropriately. There are a couple of ways to define references that are commonly used. These are:
The “unique ID” method works, but is cumbersome. I wanted something cheaper and better. The original scheme that I came up with was a struct with two fields that looked like this: struct bobble_ref { struct bobble_s *ptr; size_t seqno; }The idea was that each bobble would get a unique sequence number that it would keep a copy of. Interface routines would look at a bobble_ref and compare the sequence number in the bobble_ref struct with the sequence number in the pointed at bobble_s struct. I had three objections to this scheme. The first is that I don’t like handing out pointers to controlled data. The second is that we could never deallocate storage for bobbles, lest a reference contain a pointer to deallocated storage. The third is that the scheme is intrusive; it stores external data in the object. Instead I came up with the idea of a second structure that I called a locator. The following diagram by Ben Bacarisse illustrates the idea. (Originally “reference” was “descriptor”; a good choice of nomenclature has been a vexing problem.) reference | locator bobble +--------+ | +--------+ +--------+ | ----+-------->| ----+--------->|internal| | id: 34 | | | id: 34 | | data | +--------+ | +--------+ +--------+ | user code | internal to the bobble implementationWhen a bobble is deleted we reset the locator id to an invalid value. This scheme has some important advantages. The first is that we no longer expose the pointer to the bobble. A second is that we can put the locators in an array and put the locator index in the reference struct; the user code no longer sees a pointer. A third is that aren’t committed to keeping the storage for deleted bobbles around. There also is a potential disadvantage; a locator can never be deleted. Ben Bacarisse asked why the id field was needed, i.e., why not just set the locator pointer to 0 when the bobble is deleted. The catch is that this creates a memory leak; we can never reuse a locator because there might be a stale reference pointing to it. Having an id field lets us reuse locators; the upshot is that there must be as many locators as the maximum number of bobbles in existence at any one time. This gives us structs that look like these: struct bobble_reference_s { size_t index; size_t seqno; }; struct bobble_locator_s { union { struct bobble_s * ptr; struct bobble_locator_s * link; } u; size_t seqno; };(The link field is a convenience for maintaining a free list.) The create_bobble function then looks like: struct bobble_reference_s create_bobble(/* args */);Note that create_bobble returns a struct rather than a pointer. Tim Rentsch pointed out a number of improvements on the original proposal. He pointed out that a similar scheme is used in smalltalk with oops and otes (object oriented pointers and object table entries). He suggested that each locator slot have its own sequence number rather using one universal sequence number. This increases the number of possible distinct bobbles over time from 2^32 to 2^64. The scheme is to initialize sequence numbers to 1, don’t change when putting a locator into use, increment them when a bobble is deleted, and discard the locator when the sequence number rolls around to zero. He pointed out that the reference could be packaged into a single 64 bit integer, and, if not, that it should have a typedef. As a final note, one should not confuse bobble deletion with deallocation of the space for a bobble. Deletion deactivates a bobble; there can be no new references to a bobble, but there can be ongoing processing of the deactivated bobble. Deletion cannot be finalized until all active processing ends. Also it should be noted that there are no references to a bobble outside of the bobble module. This page was last updated December 9, 2008. |