# Fast sorting of almost sorted arrays - I

Introduction
A Lemma
Algorithm one
Algorithm two
Algorithm three
Reducing merge costs
A heap version

### Introduction

On March 6, 2004, Martin V. Cera wrote the following in the comp.theory newsgroup:
Let have an almost sorted array of 'n' integers. Each integer can be as far as 'k' integers from its target position in sorted array. I should suggest an algorithm that sorts this array in time O(n * log k). I tried to modify heap sort, but I am not sure of this modification... Do you have any ideas?
Cera's specification of an array of integers is overly restrictive; the array elements can be of any type that supports an order relationship. There are several approaches that can be taken, including his using a heap.

The problem interested me, and I came up with three variants, all based on a simple but elegant approach. So far as I know this problem is not treated in Knuth, and it is possible that these algorithms do not appear in the literature.

### A preliminary lemma

Suppose we have an adjacent pair of subarrays, S and T, of our original array, that each is sorted, and that each is of length greater than or equal to k. Let S = (A|B) T = (C|D) where B is the upper k elements of S, A the remaining lower elements of S, C is the lower k elements of T, and D is the remaining upper elements of T.

All of the elements of A are less than or equal to all of the elements of B by virtue of S being sorted. Simlarly all of the elements of C are less than or equal to all of the elements of D.

We also have that all of the elements of A are less than all of the elements of C because of the constraint that no element be more that k locations out of order. Similarly all of the elements of B are less than all of the elements of D.

Suppose that we merge B and C in place to get B' and C'. All elements of B' and C' are greater than or equal to all of the elements of A; likewise all of the elements of B' and C' are less than or equal to the all of the elements of D. It follows then that A|B'|C'|D is now sorted.

### Algorithm one

This lemma suggests a quite pretty algorithm. Use divide and conquer. Sort the first n/2 elements and sort the last n/2 elements. Then all elements are in place except possibly for the last k elements of the first half and the first k elements of the last half. Merge these for a cost of k comparisons. Then we have
```	cost(n) = 2*cost(n/2) +2k
```
With a bit of algebra and considering the case n = 4k, we come up with a cost of
```	cost(n) ~= n*log2(4k) -2k
```

### Algorithm two

Algorithm one is recursive; there is a simpler iterative version as follows:
Divide the data into segments of consecutive data, each of length 2k, with a possible remnant segment at the end. Sort each segment. Now merge in place the upper half of each segment with the lower half of the successor segment. By induction the data is sorted.

The cost of segment sorts is n*log(2k); The cost of the merges is n-2k. The total cost is:

```	cost(n) = n*log(2k) + n - 2k = n*log(4k) - 2k
```

### Algorithm three

After having devised algorithms one and two it occurred to me that there is a still simpler version as follows: Divide the array into segments of length k. Sort each segment. Now sequentially merge adjacent pairs of segments in place.

The cost of the sorts declines from n*log2(2k) to n*log2(k). This is than balanced by the increase in merge costs, which go from n-2k to 2n-k. The total cost is

```n*log2(4k) - k
```

### Reducing the cost of merging

In the discussion above it is assumed that the cost of merging two adjacent segments, each of length k, is 2k. The cost of the merges can be significantly reduced as follows: Given two successive segments to be merged, do a binary search on the first segment to find the first element greater than the first element of the second segment. Likewise do a binary search of the second segment to find the first element of the second segment that is less than the last element of the first segment. This trick is generally applicable to merging runs for which there is little overlap.

### A heap based algorithm

Here is an alternative based on the heap idea. We have a separate area of length k to hold the heap. We arrange copies of the first k elements in the heap. The idea is for each successive index, i=0,...,n-k-1, we write the top of the heap into array[i] and add array[i+k] to the heap, all of this being done in such a manner that the heap property is preserved.

The way I see to do this is to remove the top element, sift up, put the new entry into the newly vacated bottom row position, and then sift it up into place. When there are no more new elements we sift out the remaining elements using the standard heapsort. This also is a n*log(k) algorithm.

One naturally asks: Which of these two is more efficient? This is not an easy question to answer.

In the heap method we have two moves for each element, one into the heap and one back into the array. Removing an element from the heap will cost log2(k) moves and log2(k) comparisons. Entering an element have a variable cost, depending on how far up it has to sift. If if it has to sift up p locations then the cost is p moves and p+1 comparisons. Let p_bar be the average number of locations sifted up. Then (ignoring the final sift out of the heap) we have as costs per element:

```moves:       2 + log2(k) + p_bar
comparisons: 3 + log2(k) + p_bar
```
The costs of the sort/merge algorithm are harder to assess because they depend on the sort algorithm used during the sort phase. Since the merge pass requires a 2k auxilliary buffer, we can use a merge sort on the segments for a cost of log2(k) moves and log2(k) moves net cost per element.

The nominal cost of the merge pass is 2k comparisons; however because of the k-constraint the average cost is bounded by 1.5k comparisons for the merge of two segments of length k. This gives us costs per element of

```moves:	        2 + log2(k)
comparisons:  1.5 + log2(k)
```
There may be a small advantage for the sort/merge algorithm; however the actual difference in peformance will depend on the details of the implementation.

This page was last updated March 20, 2004.