Storage management techniques in C

One of the performance issues of concern in C programming is the amount of time spent in allocating and deallocating storage. The percentage of time spent in allocators in actual applications varies widely. In some applications upwards of 25% of the time is spent in the allocator; in many it is 1% or less.[1]

The application type matters – some applications need a lot of temporary storage and others need very little. Mature applications, i.e, ones that have been optimized and polished, usualy spend much less time in allocators. Part of the optimization process is identifying such time sinks and finding work arounds, usually by using variants of the techniques discussed here.

The issue is simple; the more calls that are made to malloc and free, the greater the time cost of memory allocation. What is more, the likelihood of memory leaks goes up with the number of calls to the general purpose allocator.

What should be done to reduce the cost of allocation? In many applications – nothing. In many applications storage management is a minor issue. In general, one shouldn’t use special methods unless (a) the kind of application is one that is known to be allocation intensive, or (b) profiling reveals that memory allocation is an issue.

One approach, one that is vigorously recommended by some, is to add garbage collection to C. Garbage collection simplifies user code and can be faster because it simplifies coding. I dare say that in many applications it is a satisfactory solution. Sad to say, it is not an entirely reliable solution in C programming.

An alternative is to use special purpose techniques. I will list several here with the hope that others will point out others. There are two main strategies used – slab chipping and reuse.

The idea in slab chipping is to allocate a “large” block and then mark off individual buffers by increasing a pointer into the block. The win here is that allocation is cheap and only the slab needs be deallocated. The catch is that all buffers chipped from the slab must be harvestable, i.e., no longer in use at slab deallocation time.

The idea in reuse is to reuse already allocated buffer space. Thus if we create buffer A and dispose of it, and then create buffer B and dispose of it, we can use the space for A and then for B, eliminating the intermediate deallocation and allocation. The win is eliminating many allocation/deallocation pairs. The danger is simultaneous use, i.e., using the space for A and B at the same time.

(1) Multiple copies

Suppose we need m buffers. Instead of doing m allocations of individual buffers we allocate an array of m buffers. This can work very nicely. There is a catch. The array of buffers can only be freed when all of the m buffers are no longer needed.

A natural reaction is to reuse the the buffers by maintaining the “free” buffers in a free list. Free lists are problematic. What can happen is that the space becomes fragmented, i.e., the pointers jump all over the place. When this happens you can run into cache faults.

Another issue is that if you have blooms (an explosion of allocated space that dies of rapidly) you end up with large amounts of dead space that can’t be deallocated.

That said, multiple copies with a free list is simple and (usuallY) fast. In practice it often is neither simpler nor faster than the “complicated” alternatives.

(2) Limited duration space (mark-release)

Suppose that we need buffers, some of which may be resized, and suppose that we know that at some definite point in the future all of the buffers will be stale and can be discarded. One way to handle this is to allocate a slab and allocate using slab chipping. At the definite point in the future the slab is deallocated and all of the stale buffers go away.

Instead of allocating one large slab a good alternative is to allocate smaller blocks (increasing block size each allocation) and record them in a linked list.

This method is sometimes called mark-release. It was used in early Pacal implementations. It is very fast and is simple to implement. The down sides are (a) there is no reuse of space, and (b) the objects in the block(s) do not persist beyond the release.

(3) LIFO storage

Suppose we have some buffers that satisfy the LIFO pattern, i.e., the last buffer allocated is the first one deallocated. These buffers can be allocated from a storage stack. It might seem that special code for this pattern is not needed since the LIFO pattern is embedded in the C language, but this is not the case. The system storage stack holds a congerie of storage objects whereas what is often needed are stacks of particular kinds.

GNU obstacks are an example of a LIFO storage techology. They are used in parsers and compilers to build symbol tables and to process parse trees and syntax trees. In essence, LIFO storage techniques are an elaboration of mark-release allocation; obstacks can be used in mark-release mode.

(4) FIFO storage

Suppose instead that we have buffers that satisfy the FIFO pattern, i.e., the eldest buffer is always the first one deallocated. This pattern can be implemented using a ring buffer. If there is a known bound for the amount of space used by the buffers we don’t even need to record deallocation provided that the size of the ring buffer is large enough.

In the more general case the usual first and last pointers will be needed; likewise a strategy for dealing with overflow is necessary. One way to deal with overflow is to create a new, larger ring buffer, take all new allocations from it, and drain the old ring buffer until it is empty.

The FIFO pattern can be generalized to “almost FIFO”. Instead of always deallocating the eldest, deallocate an old buffer with the proviso that there is an age limit for buffers.

There are obqueue allocators analogous to obstack allocators. The code is a bit more complicated, but the performance is similar. Obqueue allocators are particularly appropriate in producer/consumer scenarios.

(5) Persistent scratch space

Suppose f is a function that uses temporary scratch space that has to come from the heap. Suppose further that f is not recursive; i.e., it does not call itself directly or indirectly. Naive code allocates and deallocates scratch space each time f is called. Instead we could make the scratch space persistent and resizable. (In C persistent is called static.) Then, except for resizing, no allocations or deallocations are needed.

Persistent scratch space can be shared. Suppose f1…fn are n mutually non-recursive functions, i.e., they don’t call each other, either directly or indirectly. Then functions f1…fn can share the same persistent scratch space. Sharing can eliminate scratch space related allocations/deallocations, and does decrease the amount of space used. It may also improve performance because the shared scratch space is more likely to be cache resident.

Persistent scratch space sharing is an elaboration of overlay storage. The principle is the same but the process is simpler. A persistent scratch allocator automates most of the design work of manual overlay allocation.

Notes

In their paper Berger, Zorn, and McKinley took a number of applications and replaced the custom allocation code by malloc/free calls. The average time spent in memory operations was 18% with one application spending 40%. The number of applications was limited and it is arguable whether they were representative.

For what it is worth, their results are consistent with my measurements. Back

References

Reconsidering Custom Memory Allocation
Inside A Storage Allocator
GNU obstacks

This page was last updated January 5, 2010.

Richard Harter’s World

Site map

January 2010

Mathcomp

email