Tracking the median on a sliding window
Our objective here is to specify an algorithm for tracking the median of the data in a sliding window as it moves along an array. It turns out that one can indeed find the median using O(log(log(window_size)) operations per window slide. Whether this is worth doing is open to question. Here are the details.
Le 2m+1 be the window width. We examine the initial window, sort it, and divide it up into segments (bins). The first and last bins are of length m/2, the second and next last bins are of length m/4, and so until adding yet another pair of bins would reduce the bin size to less than log2(m). For example, if m=64, i.e., the window size is 129, the bin sizes would be
32, 16, 8, 8, 1, 8, 8, 16, 32where the bin of length 1 is the median. In what follows we will use the highest values of the low order bins and the lowest values of the high order bins as bin separators, except that the median is the bin separator for the inner most pair of bins.
All of the bins except the inner pair are unstructured; the inner pair are binary heaps, the low order one being a max heap, and the high order one being a min heap.
For each slide of the window the element at the trailing edge of the window is deleted and a new element at the leading edge is added. There are three possible cases:
To add an element we must first determine which bin it belongs in. Since there O(log(m)) bins we can do a binary search for a cost of O(log(log(m)). Elsewise we can search from the outermost bins inward; from the construction the amortized cost will be O(1). The cost of adding the element to its bin is O(1) except for the innermost pair of bins. Since these are heaps of size O(log(m)) the cost of adding elements to them is O(log(log(m)).
Deleting an element in bins other than the innermost pair is O(1). Deleting a bin boundary is an apparent exception; however the amortized cost is O(1). The cost of deleting elements from the innermost bins is again O(log(log(m)).
The cost of deleting the median is O(1).
Thus we see that the cost of adding and deleting elements is at worst O(log(log(m)). In fact, when one considers that the ratio of log2(m)/m goes to 0, the costs are O(1).
In case (1) there is nothing further to be done. In case (2) the median must be added to the the heap of the half that lost an element and the root of the other half's heap is extracted to become the new median. In case (3) the root of the heap of the half that gained an element is extracted to replace the median. In both cases (2) and (3) the heap operations are O(log(log(m)).
There are some special cases to consider. Heaps will regularly become empty. A heap can gain or lose members with equal probability; this creates a drunkard's walk sitution. When a heap becomes empty we can split its neighbour. To do this we must extract the half of the elements in the neighbour bin, place the in the heap bin, and heapify them. The cost of this operation is O(log(m)*log(log(m))). However it takes O(log(m)) moves for the heap to become empty so that the amortized cost of refilling the heap is again O(log(log(m)).
It can also happen that a heap gets too full; this generally occurs at the same time the opposite heap empties. The answer this time is to move surplus elements to its neighbour. This again has an amortized cost of O(log(log(m))
This page was last updated October 14, 2004.