This page was completely rewritten in 2009. The version below has been retained for the record. See the page at http://richardhartersworld.com/test/cri/2009/optmerge.html for a better treatment.

Suppose we are given two arrays, x and y, that are sorted in ascending value and that we wish to merge them into an array z. (This is the inner loop operation in merge sorts.) There are four operations that must be done, to wit:

compare current.x current.y move the smaller of the two into current.z advance position.z advance position.[x|y] as neededIn addition we need a loop termination test. The obvious choice is:

compare position.[x|y] end.[x|y](The [x|y] notation denotes the array whose element is moved into z.) When one of the arrays is exhausted the remainder of the other array is copied into z.

In this note we show that the loop termination test can (almost) be eliminated by unrolling the loop. The result is a small but significant gain in performance. Unrolling the loop cannot be done if we use the obvious termination test because that actually is two different tests.

The trick is to cast the loop as a countdown loop. We observe that the inner loop must be gone through at least n times where n = min(length.x,length.y). Accordingly the inner loop can be written in C as

for(;--n >= 0;) *z++ = (*x < *y)? *x++ : *y++;This loop can be unrolled as (again in C)

for (i = n&7;--i>=0;) *z++ = (*x < *y)? *x++ : *y++; for (i = n>>3;--i>=0;) { *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; *z++ = (*x < *y)? *x++ : *y++; }(A clever compiler will unroll the loop itself.)

In general a pass through the inner loop will not exhaust the arrays. According we need an outer loop that determines the minimum, n, from the remainder of the arrays to be processed. The outer loop looks like

begin loop n = min(length.remainder.x,length.remainder.y) begin if (n .le. 0) // cleanup code to flush the non-empty // array and escape the loop end if // inner loop end loopThe performance improvement over using the obvious termination test is modest but is more than it might seem because of the simplification of the inner loop.

This page was last updated April 12, 2001.