# Algorithms - exchanging segments in an array

0. Statement of the task

Consider the problem of exchanging two segments in an array. We are given an array A of length n with an initial segment P of length m and a terminal segment Q of length n-m so that A = PQ. The task is to replace the contents of A by A' = QP. For example suppose that n=8 and m=3 and that A = {a0,a1,a2,a3,a4,a5,a6,a7}. Then the task is to transform A into {a3,a4,a5,a6,a7,a0,a1,a2} The constraints are that no auxilliary storage other than a fixed number of temporary variables shall be used.

There is a well known algorithm for doing this which I shall call the double reversal algorithm. This algorithm is described briefly. This note also presents an alternate algorithm which I shall call the parallel rotate algorithm. The execution cost of the two algorithms are compared. Both algorithms are O(n). The parallel rotate algorithm is more complex to implement but is more efficient.

1. The double reversal algorithm

There is a simple and elegant algorithm for doing this. Let r(X) be the reversal of X. Then it is immediate that

r(r(X)) = X
r(PQ) = r(Q)r(P)
Accordingly we can do the exchange first by reversing the elements of A and reversing the individual segments (this can be done in either order.)

This is an O(n) algorithm. Since each element must be moved at least once the task requires at least O(n) operations; hence the double reversal algorithm is optimal with respect to the O function.

2. Pseudocode for the double reversal algorithm

```
dcl n int
dcl m int
dcl i,j int
dcl A data[0...n-1]

begin loop i=0;j=m-1; while (i .lt. j)
temp = a[i]
a[i] = a[j]
a[j] = a[i]
i   += 1
j   -= 1
end loop

begin loop i=m;j=n-1; while (i .lt. j)
temp = a[i]
a[i] = a[j]
a[j] = a[i]
i   += 1
j   -= 1
end loop

begin loop i=0;j=n-1; while (i .lt. j)
temp = a[i]
a[i] = a[j]
a[j] = a[i]
i   += 1
j   -= 1
end loop
```

3. Execution costs of the double reversal algorithm

To reverse an array we swap the first and last elements, the second and next to last, and so on. Each swap costs three moves. The loop costs are (a) change two indices as we move through the array and (b) a compare operation. The cost of reversing an array with n elements is:

3n/2 moves
n index changes
n/2 loop tests
The total cost of the double reversal algorithm is
3n moves
2n index changes
n loop tests
4. The parallel rotate algorithm

The fundamental idea in the parallel rotate algorithm is that exchanging segments is a permutation of the elements of the array. Permutations can be represented as a set of cycles. A permutation cycle is a list of indices where each indice is followed by the indice that will replace it. In our little example with n=8, m=3, the permutation is a single cycle (0,3,6,1,4,7,2,5). That is, element 0 is replaced by element 3, and so on until element 5 is finally replaced by element 0.

The permutation that represents segment exchange is particularly simple. It consists of p cycles of length n/p where p is the greatest common divisor of m and n. The initial elements of the p cycles are the integers 0...p-1. Within a cycle each indice i is followed by (i+m modulo m).

The general scheme is to have an outer loop that loops over the cycles and an inner loop that performs the data moves indicated by individual cycles.

5. Pseudocode for the parallel rotate algorithm

```        // Note - the inner loop has been unrolled by a factor
// of 2 to avoid needing a statement "j=k"

dcl n int
dcl m int
dcl n_m int; init n_m = n-m
dcl p,i,j,k int
dcl a data[0...n-1]

p = greatest_common_divisor(n,m)

begin loop i=0...p-1
temp = a[i]
begin loop init j=i
if (j .lt. n_m) k = j + m
else            k = j - n_m
a[j] = a[k]
begin if (k .eq. i)
a[k] = temp
escape
end if
if (k .lt. n_m) j = k + m
else            k = j - n_m
a[k] = a[j]
begin if (j .eq. i)
a[j] = temp
escape
end loop
end loop
```

6. Execution costs for the parallel rotate algorithm

Each cycle has the following costs:

n/p+1 data moves 2n/p compares n/p index changes
Since there are p cycles the total costs are
n+p data moves
2n compares
n index changes
The number of moves is variable; the worst case is when m=n/2. In that case 3n/2 data moves are required. As a general rule p will be small compared to n.

7. Comparison of algorithms

If all operations have equal costs the double reversal algorithm require 6 operations per element and the parallel rotate algorithm requires 4+p/n operations per element. In this case the parallel rotate algorithm will be modestly faster than the double reversal algorithm. Quite commonly the logic variables (e.g., the variables i, j, k) will be register variables; operations on them will be cheap compared to data moves which require memory references. In that case the parallel rotate algorithm will be close to three times as fast as the double reversal algorithm.

Caveat: The parallel rotate algorithm requires the computation of the gcd of n and m. In the worst case the gcd is an O(log n) operation. The usual computation of the gcd uses integer divides; in some machine architectures integer divides are expensive. In such cases an alternative version of gcd should be used.

This page was last updated March 15, 2001.