ICS 311 #10: Theoretical Limits of Sorting, O(n) Sorts

Outline

Lower Bound for Comparison Sorts
O(n) Sorts

Readings and Screencasts

CLRS 3rd ed. Chapter 8 (all).
Screencasts: 10C (also in Laulima and iTunesU)

Lower Bound for Comparison Sorts

We have been studying sorts in which the only operation that is used to gain information is pairwise comparisons between elements. So far, we have not found a sort faster than O(n lg n).

It turns out it is not possible to give a better guarantee than O(n lg n) in a comparison sort.

The proof is an example of a different level of analysis: of all possible algorithms of a given type for a problem, rather than particular algorithms ... pretty powerful.

Decision Tree Model

A decision tree abstracts the structure of a comparison sort. A given tree represents the comparisons made by a specific sorting algorithm on inputs of a given size. Everything else is abstracted, and we count only comparisons.

Example Decision Tree

For example, here is a decision tree for insertion sort on 3 elements.

Each internal node represents a branch in the algorithm based on the information it determines by comparing between elements indexed by their original positions. For example, at the nodes labeled "2:3" we are comparing the item that was originally at position 2 with the item originally at position 3, although they may now be in different positions.

Leaves represent permutations that result. For example, "⟨2,3,1⟩" is the permutation where the first element in the input was the largest and the third element was the second largest.

This is just an example of one tree for one sort algorithm on 3 elements. Any given comparison sort has one tree for each n. The tree models all possible execution traces for that algorithm on that input size: a path from the root to a leaf is one computation.

Reasoning over All Possible Decision Trees

We don't have to know the specific structure of the trees to do the following proof. We don't even have to specify the algorithm(s): the proof works for any algorithm that sorts by comparing pairs of keys. We don't need to know what these comparisons are. Here is why:

The root of the tree represents the unpermuted input data.
The leaves of the tree represent the possible permuted (sorted) results.
The branch at each internal node of the tree represents the outcome of a comparision that changes the state of the computation.
The paths from the root to the leaves represent possible courses that the computation can take: to get from the unsorted data at the root to the sorted result at a leaf, the algorithm must traverse a path from the root to the correct leaf by making a series of comparisons (and permuting the elements as needed)
The length of this path is the runtime of the algorithm on the given data.
Therefore, if we can derive a lower bound on the height of any such tree, we have a lower bound on the running time any comparison sort algorithm.

Proof of Lower Bound

We get our result by showing that the number of leaves for a tree of input size n implies that the tree must have minimum height O(n lg n). This will be a lower bound on the running time of any comparison sort algorithm.

There are at least n! leaves because every permutation appears at least once (the algorithm must correctly sort every possible permutation): l ≥ n!
Any binary tree of height h has l ≤ 2^h leaves (Notes #8)
Putting these facts together: n! ≤ l ≤ 2^h or 2^h ≥ n!
Taking logs: h ≥ lg(n!)
Using Sterling's approximation (formula 3.17): n! > (n/e)ⁿ
Substituting into the inequality:
h ≥ lg(n/e)ⁿ
= n lg(n/e)
= n lg n - n lg e
= Ω (n lg n).

Thus, the height of a decision tree that permutes n elements to all possible permutations cannot be less than n lg n.

A path from the leaf to the root in the decision tree corresponds to a sequence of comparisons, so there will always be some input that requires at least Ω(n lg n) comparisions in any comparision based sort.

There may be some specific paths from the root to a leaf that are shorter. For example, when insertion sort is given sorted data it follows an O(n) path. But to give an o(n lg n) guarantee (i.e, strictly better than O(n lg n)), one must show that all paths are shorter than O(n lg n), or that the tree height is o(n lg n) and we have just shown that this is impossible since it is Ω(n lg n).

O(n) Sorts

Under some conditions it is possible to sort data without comparing two elements to each other. If we know something about the structure of the data we can sometimes achieve O(n) sorting. Typically these algorithms work by using information about the keys themselves to put them "in their place" without comparisons. We only introduce these algorithms very briefly so you are aware that they exist.

Counting Sort

Assumes (requires) that keys to be sorted are integers in {0, 1, ... k}.

For each element in the input, determines how many elements are less than that input.

Then we can place the element directly in a position that leaves room for the elements below it.

An example ...

Counting sort is a stable sort, meaning that two elements that are equal under their key will stay in the same order as they were in the original sequence. This is a useful property ...

Counting sort requires Θ(n + k). Since k is constant in practice, this is Θ(n).

Radix Sort

Using a stable sort like counting sort, we can sort from least to most significant digit:

This is how punched card sorters used to work.

The code is trivial, but requires a stable sort and only works on n d-digit numbers in which each digit can take up to k possible values:

If the stable sort used is Θ(n + k) time (like counting sort) then RADIX-SORT is Θ(d(n + k)) time.

Bucket Sort

Bucket Sort maps the keys to the interval [0, 1), placing each of the n input elements into one of n-1 buckets. If there are collisions, chaining (linked lists) are used.

Then it sorts the chains before concatenating them.

It assumes that the input is from a random distribution, so that the chains are expected to be short (bounded by constant length).

Example:

The numbers in the input array A are thrown into the buckets in B according to their magnitude. For example, 0.78 is put into bucket 7, which is for keys 0.7 ≤ k < 0.8. Later on, 0.72 maps to the same bucket: like chaining in hash tables, we "push" it onto the beginning of the linked list.

At the end, we sort the lists (B shows the lists after they are sorted; otherwise we would have 0.23, 0.21, 0.26) and then copy the values from the lists back into an array.

But sorting linked lists is awkward, and I am not sure why CLRS's pseudocode and figure imply that one does this. In an alternate implementation, steps 7-9 can be done simultaneously: scan each linked list in order, inserting the values into the array and keeping track of the next free position. Insert the next value at this position and then scan back to find where it belongs, swapping if needed as in insertion sort.

Since the values are already partially sorted, an insertion procedure won't have to scan back very far. For example, suppose 0.78 had been inserted after 0.72. The insertion would only have to scan over one item to put 0.78 in its place, as all values in lists 0..6 are smaller.

Comparing the Sorts

You can also compare some of the sorts with these animations (set to 50 elements): http://www.sorting-algorithms.com/. Do the algorithms make more sense now?

Nodari Sitchinava (based on material by Dan Suthers)

Last modified: Wed Feb 19 02:14:38 HST 2014
Images are from the instructor's material for Cormen et al. Introduction to Algorithms, Third Edition, and from Wikipedia commons.