# Array rotation revisited

In the March of 2001 issue I wrote up an article about exchanging array segments, aka rotating an array. It turns out that the analysis there was not quite as acute as it might have. I (and apparently most other analysts) overlooked a key detail. The following is a reanalysis.

Statement of the problem:

Suppose we have an array A containing n equi-spaced elements. The object is to rotate the elements of A by k locations for some k, 0 < k < n. We can express this by

```   A[i] <- A[i+k],   i <  n-k
<- A[i+k-n], i >= n-k
```
An alternative way to express it is to represent A as the concatenation of two arrays P and Q of lengths k and n-k respectively. Then we have
```          A  = PQ
rotate(A) = QP
```
There are three major approaches to effecting rotation that I know of; these are:

(1) Move one of the components into a scratch array, move the other component into its final position, and then move the component in the scratch array into its final position.

The cost of this method (excluding loop costs) is:

```   Data moves:  n + min(k, n-k)
Aux storage: min(k, n-k)
```
The catch in assessing costs is that the method lends itself to bulk data moves (via hardware instructions or loop unrolling) so that data move costs are not comparable with those of other methods. Likewise the logic costs are not comparable except for naive implementations.

In short, it should be by far the fastest method of the three, but it does require a substantial amount of auxiliary storage.

(2) Reverse each component separately and then reverse the entire array. Let r() be the reversal function. Then we have

```   QP = r(r(P)r(Q))
```
Calculating the costs of this method is straightforward. The reversal of an array of length n requires n/2 swaps. Each swap requires 3 data moves. Ergo, the number of data moves is 3n/2. Each element gets swapped twice, so the total number of data moves is 3n. The number of loop tests is n/2u where u is the loop unrolling factor.

(3) Treat the rotation as a permutation of the array, and execute each permutation cycle separately. This is the most complicated of the three, potentially requires the fewest operations, and has the most varied implementations.

Let d be the greatest common denominator of n and k. Then the permutation that carries A into rotate(A) has d cycles. For example, suppose that n=14 and k=6, so that d=gcd(14,6)=2. Then the permutation cycles are:

```   (0, 6, 12, 4, 10, 2, 8)
(1, 7, 13, 5, 11, 3, 9)
```
Each cycle is of length n/d (7 in this case). A cycle consists of runs of indices increasing by k. Thus in our example, the runs in the first cycle are
```(0, 6, 12), (4, 10), and (2,8).
```
When the cycle begins at the beginning of a run the length of a run is either ldiv(n,k) or udiv(n,k) where ldiv is integer division rounded down and udiv is integer division rounded up. In our example the lengths are ldiv(14,6)=2 and udiv(14,6)=3. If, however, we start the cycle in the middle or end of a run that run will be broken, e.g.,
```   (8, 0, 6, 12, 4, 10, 2)
```
has two runs of length one, one run of length of two, and one run of length three.

When k > n/2 the long runs are descending runs. For example, suppose n=14, k=10. There are two cycles,

```   (13, 9, 5, 1, 11, 7, 3)
(12, 8, 4, 0, 10, 6, 2)
```
In this example there are descending runs of lengths ldiv(n,n-k)=ldiv(14,4)=3, and udiv(14,4)=3. In many implementations it is convenient to divide the algorithm into two cases, k < n/2, and k > n/2. In the latter case we replace k by n-k and change the direction of rotation.

There are two special cases, k=0, and k=n/2. If k=0 there is nothing to do. If k=n/2 we can swap the first and last halves. The swap requires 3n/2 data moves and n/2 loop tests; this cannot be improved upon. The cases are:

```   k = 0          Do nothing
0 < k < n/2    Increment = k, ascending runs
k = n/2        Swap first and last halves
n/2 < k < n    Decrement = n-k, descending runs
```
I will use the term “small slide” for a data move that is less than n/2 locations, and the term “big slide” for a data move that is greater than n/2,

When the runs are ascending big slides occur when the array index is in the last k elements; similarly, when runs are descending big slides occur when the array index is in the first k elements. In either case the total number of big slides is k. It follows that the number of big slides in a cycle is k/d; likewise the number of runs within a cycle is also k/d.

When effecting the rotation of a cycle the last big slide is not executed. Instead the saved initial value is used. In other words the sequence of events is:

```      An initial value is saved.
A loop consisting of k/d runs of small slides alternates
with k/d-1 big slides.
Array[final index] is filled with the saved value.
```
There are two things to settle in translating the above into code. The first is how to terminate the loop. The simplest way is to determine the final index in advance (it will be the starting index + n – k) and exit when the final index is reached.

The second is how to handle the alternation of small slide runs and big slides. One way is to use a simple loop with a test to determine if the current move is a big slide or a small slide. An example of this method is:

```        index   = first
last    = first + n - k
save    = A[first]
while (index .ne. last) begin
next     = index + k
if (next .ge. n) next = next - k
A[index] = A[next]
index    = next
end
A[last] = save
```
The costs of this code for a single cycle of length n/d are:
```    Data moves:     n/d + 1
loop tests:     n/d
if tests:       n/d - 1
```
The costs for processing the entire rotation (including a loop over d) are:
```    Data moves:     n + d
Tests           2n
```
The above code performs two tests per element, a loop test and a test to see whether the move is a big slide or a small slide.
Most implementations (including my March 2001 article) use some variant of the above loop. All are less efficient than they need be.
We can reduce the number of tests per element by putting in an inner loop for the small slide runs. A naive (and slightly erroneous) implementation for ascending runs (k .lt. n/2)looks like this:
```        nk      = n - k
index   = first
last    = first + nk
save    = A[first]
while (index .ne. last) begin
while (index .lt. nk) begin
A[index] = A[index + k]
index    = index + k
end
A[index] = A[index - nk]
index    = index - nk
end
A[last] = save
```
This code is wrong because the actual loop is a “loop and a half”. That is, the final big slide should not be made. Because it is being made, the loop termination test is also wrong. We can fix this is either of two ways. The first is to move a copy of the small slide loop outside the cycle loop. The second is to move the termination test to the middle of the cycle loop. The code for the second alternative looks like this:
```        nk      = n - k
index   = first
last    = first + nk
save    = A[first]
loop begin
while (index .lt. nk) begin
A[index] = A[index + k]
index    = index + k
end
if (index .eq. last) exitloop
A[index] = A[index - nk]
index    = index - nk
end
A[last] = save
```
The inner loop will be executed (n-k)/d times; costs for the loop are
```    Data moves:     (n-k)/d
Loop tests      n/d
```
The loop termination test will be executed k/d times and the big slide will be executed k/d -1 times. The total costs for this code are:
```    Data moves:     n + d
Loop tests:     n + k + d
```
The number of loop tests can be reduced by k by using an “until” loop instead of a “while” loop. This eliminates one loop test for each small slide loop.

A similar economy can be employed in the outer loop over the number of cycles. If for some reason it is desirable to avoid computing the gcd, we can count the number of small slide loops executed; when the last cycle is completed this number will be k. Thus our code for ascending runs can look like this:

```        nk      = n - k
first   = 0
count   = 0
until (count .ge. k) begin
last = first + nk
index = first
save  = A[first]
loop begin
count += 1
until (index .ge. nk) begin
A[index] = A[index + k]
index   += k
end
if (index .eq. last) exitloop
A[index] = A[index - nk]
index   -= nk
end
A[last] = save
first  += 1
end
```
The analogous code for descending runs is:
```        nk      = n - k
first   = n - 1
count   = 0
until (count .ge. nk) begin
last = first - k
index = first
save  = A[first]
loop begin
count += 1
until (index .lt. nk) begin
A[index] = A[index - nk]
index   -= nk
end
if (index .eq. last) exitloop
A[index] = A[index + k]
index   += k
end
A[last] = save
first  -= 1
end
```
This code clearly is the most economical method for rotating an array in terms of operations used since it uses n+d data moves and loop tests.