This particular page comes from the chapter that presents standard searching and sorting algorthms. The algorithm shown here is a version of Quicksort.
That argument is a little spurious, but the conclusion is correct. Data can be sorted in Nlg(N) time. There are some algorithms that guarantee an Nlg(N) behaviour. The most popular of the sophisticated sorting algorithms does not guarantee Nlg(N) behaviour; in some cases, e.g. when the data are already more or less sorted, its performance deteriorates to N**2. However, this "Quicksort" algorithm usually runs well.
37, 21, 12, 103, 82, 97, 64, 62, 62, 28, 44, 50, 33, 91then (s)he splits this into:
37, 21, 12, 33, 50, 44, 28,and
62, 62, 64, 97, 82, 103, 91It is up to the next two clerks to sort out these subarrays.
In the example just shown, the clerk knew "intuitively" that the best partitioning value would be around 60 and moved values less than this into the "small" group with the others in the "large" group. Usually, the best partitioning value is not known. The scheme works even if a non-optimal value is used. Suppose that the first clerk picked 37, then the partitions could be:
33, 21, 12, 28and
82, 97, 64, 62, 62, 103, 44, 50, 37, 91The first partition contains the values less than 37, the second contains all values greater than or equal to 37. The partitions are unequal in size, but they can still be passed on to other clerks to process.
The entire scheme for dealing with the data is shown in Figure 13.2. The pattern of partitions shown reflects the non-optimal choice of partitioning values. In this example, it is quite common for the partitioning to separate just one value from a group of several others rather than arranging an even split. This necessitates extra partitioning steps later on. It is these deviations from perfect splitting that reduce the efficiency of Quicksort from the theoretical Nlg(N) optimum.
How should the data be shuffled to get the required groups?
You start by picking the value to be used to partition the data, and for lack of better information you might as well take the first value in your array, in this case 37. Then you initialize two "pointers" - one off the left of the array, and one off the right end of the array:
37, 21, 12, 103, 82, 97, 64, 62, 62, 28, 44, 50, 33, 91 Left Ý RightMove the "right" pointer to the left until its pointing to a data value less than or equal to the chosen partitioning value:
37, 21, 12, 103, 82, 97, 64, 62, 62, 28, 44, 50, 33, 91 Left Ý RightNext, move the "left" pointer to the right until its pointing to a data value greater than or equal to the partitioning value:
37, 21, 12, 103, 82, 97, 64, 62, 62, 28, 44, 50, 33, 91 Left Ý RightYou have now identified a "small" value in the right hand part of the array, and a "large" value in the left hand part of the array. These are in the wrong parts; so exchange them:
33, 21, 12, 103, 82, 97, 64, 62, 62, 28, 44, 50, 37, 91 Left Ý Right
Do it again. First move the right pointer down to a value less than or equal to 37; then move the left pointer up to a value greater than or equal to 37:
33, 21, 12, 103, 82, 97, 64, 62, 62, 28, 44, 50, 37, 91 Left Ý RightOnce again, these data values are out of place, so they should be exchanged:
33, 21, 12, 28, 82, 97, 64, 62, 62, 103, 44, 50, 37, 91 Left Ý Right
Again move the right pointer down to a value less than 37, and move the left pointer up to a greater value:
33, 21, 12, 28, 82, 97, 64, 62, 62, 103, 44, 50, 37, 91 Right Left ÝIn this case, the pointers have crossed; the "right" pointer is to the left of the "left" pointer. When this happens, it means that there weren't any more data values that could be exchanged.
The "clerk" working on this part of the problem has finished; the rest of the work should be passed on to others. The data in the array up to the position of the "right" pointer are those values less than the value that the clerk chose to use to do the partitioning; the other part of the array holds the larger values. These two parts of the array should be passed separately to the next two clerks.
The scheme works. The rules are easy enough to be understood by a bureaucracy of clerks. So they are easy enough to program.
void Quicksort( int d[], int left, int right);The details of the range will be given as the indices of the leftmost (low) and rightmost (high) data elements. The initial call will specify the entire range, so if for example you had an array data[100], the calling program would invoke the Quicksort() function as:
Quicksort(data, 0, 99);Subsequent recursive calls to Quicksort() will specify subranges e.g. Quicksort(data, 0, 45) and Quicksort(data, 46, 99).
The partitioning step that splits an array into two parts (and shuffles data so one part contains the "low" values) is sufficiently complex to merit being a separate function. Like Quicksort() itself, this Partition() function needs to be given the array and "left and right" index values identifying the range that is to be processed:
int Partition( int d[], int left, int right);The Partition() function should return details of where the partition point is located. This "split point" will be the index of the array element such that d[left] ... d[split_point] all contain values less than the value chosen to partition the data values. Function Partition() has to have some way of picking a partitioning value; it can use the value in the leftmost element of the subarray that it is to process.
The code for Quicksort() itself is nice and simple:
void Quicksort( int d[], int left, int right) { if(left < right) { int split_pt = Partition(d,left, right); Quicksort(d, left, split_pt); Quicksort(d, split_pt+1, right); } }If the array length is 1 (left == right), then nothing need be done; an array of one element is sorted. Otherwise, use Partition() to shuffle the data and find a split point and pass the two parts of the split array on to other incarnations of Quicksort().
The Partition() function performs the "pointer movements" described in the algorithm. It actually uses to integer variables (lm = left margin pointer, and rm = right margin pointer). These are intialized "off left" and "off right" of the subarray to be processed. The partitioning value, val, is taken as the leftmost element of the subarray.
int Partition( int d[], int left, int right) { int val =d[left]; int lm = left-1; int rm = right+1;
The movement of the pointers is handled in a "forever" loop. The loop terminates with a return to the caller when the left and right pointers cross.
Inside the for loop, the are two do ... while loops. The first do ... while moves the right margin pointer leftwards until it finds a value less than or equal to val. The second do ... while loop moves the left margin pointer rightwards, stopping when it finds a value greater than or equal to val.
If the left and right margin pointers have crossed, function Partition() should return the position of the right margin pointer. Otherwise, a pair of misplaced values have been found; these should be exchanged to get low values on the left and high values on the right. Then the search for misplaced values can resume.
for(;;) { do rm--; while (d[rm] > val); do lm++; while( d[lm] < val); if(lm < rm) { int tempr = d[rm]; d[rm] = d[lm]; d[lm] = tempr; } else return rm; } }
The partitioning step is breaking up the array. If it worked ideally, it would keep splitting the array into halves. This process has to be repeated until the subarrays have only one element. So, in this respect, it is identical to binary search. The number of splitting steps to get to a subarray of size one will be proportional to lg(N).
In the binary search situation, we only needed to visit one of these subarrays of size one - the one where the desired key was located. Here, we have to visit all of them. There are N of them. It costs at least lg(N) to visit each one. So, the cost is Nlg(N). You can see from Figure 13.2 that the partitioning step frequently fails to split a subarray into halves; instead it produces a big part and a little part. Splitting up the big part then requires more partitioning steps. So, the number of steps needed to get to subarrays of size one is often going to be greater than lg(N); consequently, the cost of the sort is greater than Nlg(N).
In the worst case, every partition step just peels off a single low value leaving all the others in the same subarray. In that case you will have to do one partitioning step for the first element, two to get the second, three for the third, ... up to N-1 for the last. The cost will then be approximately
or 0.5N**2.1 + 2 + 3 + ... + (N-2) + (N-1)
There is one tweak that is usually worthwhile when you have big arrays (tens of thousands). You combine two sort algorithms. Quicksort() is used as the main algorithm, but when the subarrays get small enough you switch to an alternative like selection sort. What is small enough? Probably, a value between 5 and 20 will do.
The following program illustrates this composite sort (Borland users may have to reduce the array sizes or change the "memory model" being used for the project):
#include < iostream.h> #include < stdlib.h> const int kSMALL_ENOUGH = 15; const int kBIG = 50000; // Relying on int = long int data[kBIG]; void SelectionSort(int data[], int left, int right) { for(int i = left; i < right; i++) { int min = i; for(int j=i+1; j <= right; j++) if(data[j] < data[min]) min = j; int temp = data[min]; data[min] = data[i]; data[i] = temp; } } int Partition( int d[], int left, int right) { int val =d[left]; int lm = left-1; int rm = right+1; for(;;) { do rm--; while (d[rm] > val); do lm++; while( d[lm] < val); if(lm < rm) { int tempr = d[rm]; d[rm] = d[lm]; d[lm] = tempr; } else return rm; } } void Quicksort( int d[], int left, int right) { if(left < (right-kSMALL_ENOUGH)) { int split_pt = Partition(d,left, right); Quicksort(d, left, split_pt); Quicksort(d, split_pt+1, right); } else SelectionSort(d, left, right); } int main() { int i; long sum = 0; for(i=0;i < kBIG;i++) sum += data[i] = rand() % 15000; Quicksort(data, 0, kBIG-1); int last = -1; long sum2 = 0; for(i = 0; i < kBIG; i++) if(data[i] < last) { cout << "Oh ....; the data aren't in order" << endl; cout << "Noticed the problem at i = " << i << endl; exit(1); } else { sum2 += last = data[i]; } if(sum != sum2) { cout << "Oh ...; we seem to have changed the data" << endl; } return 0; }Last modified March 1996. Please email questions to nabg@cs.uow.edu.au