Inefficient sort algorithms
by Richard Harter

Abstract
How it all started by Richard Harter
Other sorts by various
A convergence proof by Richard Harter and Thomas Marlowe
Inefficient sorts and complexity by Thomas Marlowe
Some remarks on complexity of three other pessimal sorting algorithms by Thomas Marlowe
Another evil sort by Carl Howells
Acknowledgements

## Abstract

This paper is the condensation of a discussion of inefficient sorts in the talk.origins newsgroup. The main focus was on a sort suggested by Richard Harter. Although the topic (and much of the discussion) was apparently frivolous it raised real mathematical issues which were considered in some depth by Thomas Marlowe.

## How it all started

In the talk.origins newsgroup someone made an offhand remark that they didn't see how anybody, no matter how klutzy their programming, could write an O(n^3) sort. I immediately gave a simple example:

```	begin procedure sort(a,n)
begin do i = 1...n
bubblesort(a,i)
end do
end procedure sort
```

I then casually mentioned my favorite inefficient sort:

```	begin procedure sort(a,n)
begin loop
if (sorted(a,n)) return
i = random(n)
j = random(n)
if (i .gt. j) swap(i,j)
if (a(i) .gt. a(j)) swap(a(i),a(j))
end loop
end procedure sort
```

This bit of programming fluff produced a great deal of response; evidently there is something deliciously perverse about deliberate inefficiency. For example there was the following exchange:

Richard Harter:
It is immoral to write bad code even if it is never going to be used.

Does this apply to didactics about inefficient sorts?

Stephen F. Schaffner
Just because it's immoral doesn't mean you shouldn't do it.

Richard Harter:
It's an original sin thingie. Before Adam used  the apple he wrote perfect code. When Eve listened to the serpent  and talked Adam into using the apple  they lost their user account and the ability to write perfect code.

 The sacred accounts refer to eating the apple but this is a bit of confusion which arose because the AI program in THE APPLE would say "EAT ME" when it was not pleased with its input.

 The identity of the serpent is obscure. It is rumored that he is currently running a computer company in Redmond.

 Eve was tempted by designer colors.

Some other sorts Ken Cox mentioned a "real" sort Chandy and Misra's UNITY book, for example, gives the following sorting "program":
```{ || i : 0 <= i < N : A[i],A[i+1] := A[i+1],A[i]  if A[i] > A[i+1] }
```
(As long as there are adjacent elements out of order, swap them.) They later devote an entire chapter to implementations.

Mark VandeWettering accidently simplified my algorithm to get the monkeysort (random swap) algorithm: keep randomly swapping element pairs until the array is sorted. Wade Hines correctly identified Mark's algorithm as the monkeysort algorithm. See the analysis below by Thomas Marlowe for a discussion of the performance of this algorithm.

Thomas Marlowe mentioned several sorts which are even worse:

An algorithm which checks all permutations to see if that rearrangement is correct clearly has WC O (N * N!) running time.

An algorithm which guesses a random permutation and checks has average case complexity O (N * N!), since the chance of guessing the right permutation is 1/N!.

[note: O(N * N!) does not equal O(N!)]

You can add another factor of N to most algorithms by insisting on checking every pair of indices. If you check in lexical order, you cannot conclude until you check the last pair that you really have a sorted list.

Mark VandeWettering also mentioned Tom Duff's sillysort:
```
/*
* The time complexity of this thing is O(n^(a log n))
* for some constant a. This is a multiply and surrender
* algorithm: one that continues multiplying subproblems
* as long as possible until their solution can no longer
* be postponed.
*/
void sillysort(int a[], int i, int j){
int t, m;
for(;i!=j;--j){
m=(i+j)/2;
sillysort(a, i, m);
sillysort(a, m+1, j);
if(a[m]>a[j]){ t=a[m]; a[m]=a[j]; a[j]=t; }
}
}
```
This is a perverse selection sort. In the first iteration of the outer loop the last element of the sorted array is found, in the second the next to last, and so on. In each iteration the maximum element is found by dividing the remaining segment into two parts and comparing the the maximan of the two parts.

Several people mentioned improbable sorts based on analogues with evolution. And finally I mentioned another sort which remains unanalyzed but whose performance is abysmal:

The idea is to give each value a ranking so that the big values have a high ranking and the small ones a low ranking. We then group the values of equal rank and reorder the list so that the groups are in order.

The assignment of ranks is done as follows: The initial value in the list is given a rank of 0. We step through the list. If the current value being looked at is greater than the previous value it is given a rank of one more than the rank of the previous value and, if less, it is given a rank of one less than the rank of the previous value. Here is an example of the ranking and reordering:

```old list: 1, 8, 3, 4, 5, 6, 7, 2, 9
ranking:  0, 1, 0, 1, 2, 3, 4, 3, 4
new list: 1, 3, 8, 4, 5, 6, 2, 7, 9
```

## Convergence

Doubt was raised as to whether the * algorithm would converge. Richard Harter pointed out it was monotonic, i.e., that each cycle of the loop could decrease but never increase the total number of inversions in the list and thus was an instance of the following schema:
Suppose that we have a state S and a metric on S, m(S) .GE. 0, and a suite of operations on S, f0...fn, and we want to apply operations on S sequentially until m(S) = 0. Our code looks something like this:
```	begin loop
select f such that m(f(S)) .LT. m(S)
if no f selected escape loop
S <- f(S)
end loop
```
Now this loop must terminate (on a computer - if m(S) is a value on the reals it could approach a limit point) because m(S) cannot be reduced indefinitely. It will produce the desired result (m(S)=0) if it is the case that whenever m(S) > 0 there is an f such that m(f(S)) < m(S).
Thomas Marlowe separately provided the following proof of (almost certain) termination:

The key measure of the out-of-sorts (sorry!) of a list is the number of inversions: the cardinality of the set of all pairs (i,j) with i < j and A[i] > A [j]. (For simplicity, assume all elements are distinct.) Clearly, any list with zero inversions is sorted.

Any algorithm which

1. Never increases the number of inversions (as measured at the end of a bounded-length computation step) and
2. Has a probability bounded away from 0 of detecting and eliminating an inversion if one exists [it's sufficient that some fixed length sequence has this property, which is the case with most deterministic algorithms], and
3. Has a probability bounded away from 0 of detecting that no inversions exist and terminating
will converge and return a sorted list with probability 1.

## Inefficient sorts and complexity

This provides a sharper bound on the average complexity of the sorting algorithm of the form:

```	while (not sorted) do {
generate a random pair (I,J);
if (I >= J) continue;
if the pair corresponds to an inversion {
swap;
test and return if sorted; }
}
```

NUMBER OF TRIALS

In the worst case (a reverse sorted list) there are N * (N-1)/2 inversions.

With K inversions, any given pair (I,J) has a probability K / N^2 of corresponding to one of them, so an inversion can be expected after N^2 / K trials, and the result decreases the number of inversions by at least 1, and by exactly 1 if I and J are adjacent.

Thus a bound for the total number of trials is

```	SUM_{r=1}^{N*(N-1)/2} N^2/R
= N^2 SUM_{r=1}^{N*(N-1)/2} 1/R
~ N^2 ln (N^2) = O (N^2 lg N)
```
[using the standard result that the partial sums of the harmonic function are essentially the natural log (plus Euler's constant)]

Since each swap can lower the number of inversions by at most N, this is also a lower bound on the expected number of trials in the worst case.

Finally, since the average number of inversions is N * (N-1)/4 (just pair sequences with their reverses), this is in fact also an average bound for random sequences.

COMPLEXITY

The actual average-case complexity is C * T, where C is the expected number of trials, and T is the cost per trial.

In the naive algorithm, the cost of testing for an inversion and swapping is constant (O (1)), while the cost of testing is O (N), giving a total complexity of O (N^3 lg N).

It has already been suggested that we can add a factor of N by checking all N * (N-1)/2 pairs, for O (N^4 lg N). While we could check all N^2 pairs, this would not only result in redundant information, but redundant items of information, and so is probably not credible --- and in any case only buys a constant factor increase in time.

GETTING REALLY UGLY

We can actually do worse, by again generating random pairs and maintaining a bitvector (this time of length N^2, for simplicity of coding). The neat thing here is that we get a third layer of looping, since we keep having to check the bitvector array to see that it is identically 1.

```	while (not sorted) do {
generate a random pair (I, J);
if (I >= J) continue;
if the pair corresponds to an inversion {
swap;
checked = 0;
while (not checked) do {
generate a random pair (S, T);
check if (A[S], A{T}) in correct order;
on failure break; /* exit to outer loop */
/* at this point. may be sorted */
/*	and may be finished checking */
checked [S,T] = 1;
tested = 1;
for (X,Y) = (1,1) to (N,N) do {
tested = tested * Check [X,Y];
}
checked = tested;
}
}
}
```
This is kind of neat because the outer loop has O (N^2 lg N) iterations. For a similar reason, the check loop also has O (N^2 lg N) iterations. Finally, the test loop always takes time N^2.

Thus the total average-case complexity is

```	O (N^6 (lg N)^2)
```
I don't believe you can do any worse than this from a seemingly credible monotone (always improving) swap-based algorithm. (The guess-a-random permutation algorithm can be made to have average case
```	O (N! * N^4 * (lg N)^2) = O(N!)
```
average case by using the checking method proposed above.)

IMPROVING A BAD ALGORITHM

As mentioned in earlier, any monotone algorithm with an expensive termination test can be made faster by simply not always testing. Let the expected number of steps be S, the test cost T, and the generating cost G. The total expected cost is

```	O (S * (T + G))
```
If T = O (G) then there is no improvement.

If T >> G, and S >> T/G, tests are made only every T/G steps. The amortized cost of testing is constant. Note that up to T/G additional steps may be required. The (average-case) complexity is then

```	O ((S + T/G) * (G + 1/(T/G) * T) = O (S * G)

```
If the expected number of steps S is less than T / G, then testing every S steps gives
```	O ((S + S) * (G + 1/S * T)) = O (S * max (G, T/S))
= O (max (S * G, T))
```
Using the random swap approach, this lowers the complexity to
```	O (N^2 lg N) for either the O (N) or O (N^2) test

O (N^4 lg N) for the N^4 lg N testing method.
```
However, this speedup does not apply to either the random permutation method, or the always-swap-random-pairs method, since neither is monotone.

## Some remarks on complexity of three other pessimal sorting algorithms

### 1. Random swap algorithm

Consider the algorithm that swaps arbitrary elements until the array is sorted. (As distinguished from the monotonic random swap algorithm, in which only improving swaps are made.) We can ignore the particular values, and consider that the list is a permutation of 1,,N .

In principle, the expected number of steps starting with a worst-case (reverse sorted) list can be found by looking at the Markov process with N! states (one per permutation).

• The identity permutation is absorbing, since the algorithm terminates. Every other permutation has N*(N-1)/2 + 1 successors: N*(N-1)/2 successors resulting from applying a single transposition (swap), with probability 2/N^2 (since the indices could be generated in either order), and a self-loop resulting from choosing [I,I], with probability 1/N.
• Any permutation not a single swap from the identity has N*(N-1)/2 incoming edges, as does the identity permutation; the others have one fewer incoming edge, since the identity is absorbing.
Note that if we don't stop when the list becomes sorted, then every permutation has equal probability, and starting with a random permutation, we can expect to be in the sorted case exactly 1/N! of the time, suggesting that the expected number of trials until (issorted) is O(N!).

### 2. Reversing subsequence algorithm

Roy Thearle made the following suggestion:

BTW, have you considered that instead of just swapping the two randomly chosen locations if they're out of order, you could invert the whole sequence of items between them (inclusive)?

I think this would not only increase the time required for each swap, but also make the sort less efficient due to reducing the tendency for low items to drift downwards and high items to drift upwards.

In the reversing variant of the monotonic random sort, the algorithm still only makes "improving" moves, but it is no longer true that it is monotonic in the sense of lowering the number of inversions.

Rather, it is monotonic in the following sense, which is probably a key to a more careful analysis (not presented here, my brain isn't up to it): Any operation results in a sequence which is earlier in lexicographic order, and later in reverse lexicographic order, than the original. (The subsequence before the reversal remains the same, and the first element in the reversal becomes smaller. Likewise, the subsequence afterwards remains the same, and the last reversed element becomes larger.)

It is also true that (*) if there is a partition point K (think quicksort) so that all entries before position K are smaller, and all entries after are larger, the algorithm cannot undo that. In particular, once the smallest element gets to be first, or the largest last, they will never move.

Thus the number of steps for the algorithm can be bounded by iterating the number of steps it takes to move one of the extreme elements to the end (necessarily the correct end) of the list. Out of N^2 pairs, there are 4 that will do the right job --- [1,L], [L,1], {H,N], {N,H], where L and H are the indices of the minimal and maximal elements, respectively. The expected number of steps is thus

```
Steps (N) <= N^2/4 + Steps (N-1)
```

Total number of steps <= O (N^3). (As compared to O (N^2 lg N) for the swap variant.)

I suspect, but haven't yet shown, that there is a much smaller chance of forming any other partition point, and that this won't reduce the expected value by more than a constant. Simulation might be the way to go here.

AN ARGUMENT FOR A LOWER COMPLEXITY

There is, surprisingly, a heuristic argument for an O (N^2 lg N) complexity for this variant as well. Consider the set of all permutations, and the action of individual transpositions using the random swap and random reverse algorithms. Then

• Each permutation is either stable for both algorithms for a given proposed transposition (because there is no inversion there), or results in an operation.
• Exactly half of all permutations are stable for a given pair, and only the identity is stable for all pairs.
• Each permutation will be changed by exactly as many pairs as it has inversions (essentially by definition of inversion and of the two algorithms.)
• Each stable permutation under a given transposition is also the result of applying that operation to exactly one other permutation. (For each pair of indices for which a given permutation is stable, applying that operation in any case gives the unique permutation that maps to it, since either the swap or reverse operations are self-inverse.)
• A proposed transposition (I,J) with |I - J| <= 3 has the same result for both methods (there's at most one element in between).
• For a given proposed transposition with |I - J| > 3, the stable and unstable permutations each pair. There are stable permutations (S1, S2) and unstable permutations (U1, U2) so that
• S1 results from U1 under swap, and S2 from U1 under reversal.
• S2 results from U2 under reversal, and S1 from U2 under swap.

(Essentially, U1 and U2 differ by having reversed sequences between positions I and J.)

• Thus for every chain that swap makes shorter (longer), there is corresponding chain that reversal makes shorter (longer). Thus it "stands to reason" that the two algorithms have the same average case behavior.

The reason this is (as currently phrased) at best a heuristic argument is that reversal may be shortening steps that occur on relatively few chains, but lengthening steps on many, or may be making some already short chains somewhat shorter, while making most long chains longer.

I would appreciate anyone who could tighten up this argument, either by showing by exhaustive simulation that the average case complexity is identical for N = 6 and 7, or by extending the pairing argument to chains.

AMORTIZING TEST COST

As with previous remarks, the complexity of testing for sortedness can vary between O (N) and O (N^4 lg N). Prior remarks on amortizing test cost apply, because although this algorithm is not monotonic, it is stable --- additional steps applied to a sorted list (or even a partitioned list) cannot undo progress.

### 3. Heap sort

Just after I finished the post, I had another, more technical candidate for worst swap-based algorithm. I haven't thought through the complexity analysis yet.

Consider heap sort. Think of the entries as in a 0-based array. Recall that:

1. the children of node I are in entries 2*I+1 and 2*I+2,
2. the max-heap property is that all interior nodes are larger than their two children,
3. in the max-heap algorithm, once the heap property holds everywhere, the max is exchanged with the last element and "frozen in place".
```/* basic outline						  */
/* outer loop is iteration of heap/reheap-and-swap		  */
/* loop2: swap random pairs to try to establish heap property	  */
/* loop3: if a pair works, see if heap property is satisfied	  */
/* loop4: check if heap property has been checked everywhere      */

/* note the following (deliberate) inefficiencies		  */
/* (0) generation throws out almost all pairs as irrelevant       */
/* (1) heap property is checked one child at a time		  */
/* (2) heap property is not established or checked on a swap	  */
/* (3) conversely, heap property is checked every time		  */
/*     it works for some pair					  */

for K = N downto 1 do {
/* step 1: try to establish the heap property */
heaped = 0;
declare Bits [K div 2] = {0};
if (last internal node has only one child) /* K is even */
Bits [K div 2] = 1;
while (not heaped) do {
generate two values (I,J) in [1,K];
if (J is not a child of I) continue;
else
if (A[I] < A[J]) {
swap;
/* note this is uglier than vanilla heap sort */
/* because may swap root with left, and then  */
/* new root with right (for example)          */
/* also note this may invalidate		*/
/* the heap property at the parent		*/
if (I != 0)
/* I is the left child of its parent if it's odd */
Bits [I div 2][(I+1) mod 2] = 0;
continue; }
if (A[I] > A[J])
if (J == 2*I + 1) 	/* left child */
Bits [I] = 1;
else			/* right child */
Bits [I] = 1;
/* step 2: test if Bits is entirely 1 */
declare Bits2 [K div 2] = {0};
fix last entry if needed;
while (1) do {
generate two values (S,T) in [1,K];
if (T not a child of S) continue;
else if (T == 2*S + 1)	/* left child */
C = 0;
else				/* right child */
C = 1;
Bits2 [S][C] = 1;
if (Bits [S][C] == 0) break;
else {
/* step 3: test if everyone has been tested */
tested = 1;
for (P,R = 0,0 to (K div 2),2) do {
tested *= Bits2 [P,R]; }
heaped = tested;
}
}
}
}
swap A and A[K-1];
}
```

## An evil sort

Carl Howells contributed this little gem which is O((n ^ 2)!):
```
/*
* This sort was written to have the worst asymptotic
* efficiency that I could come up with.  This was
* accomplished by combining three factors:
*
* 1. Test every possible ordering.  This alone brings
*    sorting to O(n!), but I knew worse was possible.
*
* 2. Instead of generating each possible ordering as
*    efficiently as possible, I generate each possible
*    ordering many times.
*
* 3. In order to prevent accidentally running quickly
*    on lucky input, it doesn't do any shortcutting
*    to ensure that it returns immediately once a
*    correctly sorted ordering is found.
*
* Timing can be described with a combination of three
* recurrance relations:
*
* T(n) = U(n, 0)
* U(m, p) = O(m * p) * U(m - 1, p + 1)
* U(0, p) = O(p);
*
* I haven't solved the recurrance directly, but I know
* that it is bounded from above by O(n ^ (2 * n)).  I'm
* unsure of the lower bound, but I suspect it is
* Omega((n ^ 2)!).
*/

public class EvilSort
{

///////////////////////  Beginning of sort code  ///////////////////////

public static void sort(int [] A)
{
recSort(A, new int, A);
}

private static void recSort(int [] first,
int [] second,
int [] result)
{
if (first.length == 0 && isSorted(second))
{
// base case we're looking for

for (int i = 0; i < second.length; i++)
result[i] = second[i];
}
else
{
for (int i = 0; i < first.length; i++)
{
for (int j = 0; j <= second.length; j++)
{
int [] t1 = new int[first.length - 1];
int [] t2 = new int[second.length + 1];

for (int k = 0; k < t1.length; k++)
{
if (k < i) t1[k] = first[k];
else t1[k] = first[k + 1];
}

for (int k = 0; k < t2.length; k++)
{
if (k < j) t2[k] = second[k];
else if (k == j) t2[k] = first[i];
else t2[k] = second[k - 1];
}

recSort(t1, t2, result);
} // for
} // for
} // if..else

} // recSort

private static boolean isSorted(int [] A)
{
for (int i = A.length - 1; i > 0; i--)
if (A[i] < A[i - 1]) return false;

return true;
} // isSorted

//////////////////////////  End of sort code  //////////////////////////

public static void main(String [] args)
{
// On my system, testing with 7 took about 24 seconds.
// If my calculation that the process is Omega((n ^ 2)!) is
// accurate, and that is the dominating factor at size 7,
// sorting 8 elements would take far longer than my
// lifetime.  Far longer than the sun's lifetime.
// Probably far longer than the universe's lifetime.
int [] A = makeArray(Integer.parseInt(args));

long time = System.currentTimeMillis();
sort(A);

time = System.currentTimeMillis() - time;
System.out.println(time + " ms needed to sort.");

printArray(A);
} // main

private static void printArray(int [] A)
{
for (int i = 0; i < A.length; i++)
System.out.print(A[i] + " ");

System.out.println();
} // printArray

private static int [] makeArray(int size)
{
int [] result = new int[size];

for (int i = 0; i < size; i++)
result[i] = i;

java.util.Random r = new java.util.Random();

for (int i = 0; i < size; i++)
{
int t = r.nextInt(size);
int temp = result[t];
result[t] = result[i];
result[i] = temp;
}

return result;
} // makeArray

} // class Sort
```

## Acknowledgements

Thomas Marlowe is a Professor of Mathematics and Computer Science at Seton Hall University. Richard Harter is the Director of Research at the Concord Research Institute. Dr. James Benham of Montclair State University is responsible for the bitvector idea, and aided with the complexity analysis.

People who contributed thoughts and comments to this paper include Ken Cox, Wade Hines, Stephen F. Schaffner, Mark VandeWettering, and Sebastian Paaske Třrholm. Carl Howells contributed the "evil sort" which may be the worst of the bunch.

This page was last updated May 22, 2006.
It was moved April 30, 2006