QuickSort Dijkstra 3-Way Partitioning: why the extra swapping?

Question Detail: 

Algorithm

Given the algorithm above (taken from the slides (p. 35) of the Coursera course "Algorithms Part I" by Robert Sedgewick and Kevin Wayne), look at the scenario where i is at "X", the following happens:

Scenario: i -> "X", "X" > "P"

1. swap("X", "Z"), gt--;   // the value at i is now "Z", which is still > "P" 2. swap("Z", "Y"), gt--;   // the value at i is now "Y", which is still > "P" 3. swap("Y", "C"), gt--;    // Now we finally get a value at i "C" which is < "P" // Now we can swap values at i and lt, and increrement them 4. swap("P", "C"), i++, lt++; 

Why don't we just decrement gt until gt points to a value that is < the value at lt ("P" in this case), then we swap this value with the value at i. This will save swapping operations.

So if we do that for the scenario mentioned above, we'll do:

1. gt-- 2. gt-- 3. swap("X", "C"), gt--;    // Now we can swap values at i and lt, and increrement them 4. swap("P", "C"), i++, lt++; 

Is this excessive swapping needed for the algorithm? does it improve performance in some way? If it does improve performance, how?

If it doesn't affect performance, please give a proper explanation or a proof as to why this it does not affect performance.

Also, would the second method I mentioned affect performance in any way? please explain why.

P.S. "Affect performance" as used above means either improve/degrade performance.

Asked By : Joshua Kissoon
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/22389

Answered By : FrankW

In order to know that you may just decrement gt, you have to compare the element at position gt to the pivot.

For illustration, here is the relevant part of the code from the slides you took your illustration from:

while (i <= gt) {    int cmp = a[i].compareTo(v);    if      (cmp < 0) exch(a, lt++, i++);    else if (cmp > 0) exch(a, i, gt--);    else              i++; } 

Your modification would look like this:

while (i <= gt) {    int cmp = a[i].compareTo(v);    if      (cmp < 0) exch(a, lt++, i++);    else if (cmp > 0)     {       while (a[gt].compareTo(v) > 0) gt--;       exch(a, i, gt--);    }    else              i++; } 

If the element is larger, we save a swap, as you illustrated. Furthermore we won't have to compare that element again. If, however, the element is not larger, we will compare it to the pivot twice, once before the swap and once after the swap.

So with your modification we save a swap for each element that is larger than the pivot and on a position that ends up to the right of the pivot, while we introduce an additional comparison for each element in such a position that is not larger than the pivot.

Averaging over all inputs we can expect the number of elements smaller resp. larger than the pivot to be equal. So the performance will depend on the number of elements equal to the pivot (and the cost of a swap vs. a comparison). Your modification will have the better relative performance, the smaller that number is. But 3-way partitioning was developed with inputs with many equal keys in mind, so the original choice seems reasonable.


Taking a second look at the modified algorithm, we note that if we compare an element twice, there is no other comparison happening between these two. So it seems worthwhile to remember the result of the comparison. However, if we want to keep the structure of the code, this means a lot of control structure overhead:

Boolean doComp = true; int cmp; while (i <= gt) {    if (doComp) cmp = a[i].compareTo(v);    doComp = true;    if      (cmp < 0) exch(a, lt++, i++);    else if (cmp > 0)     {       while (cmp = a[gt].compareTo(v) > 0)       {          gt--;          doComp = false;       }       exch(a, i, gt--);    }    else              i++; } 

My gut feeling is that even though it combines the advantages of the previous two versions in terms of number of comparisons and swaps, it will perform worse due to the overhead of the additional control structures.

Not only have we introduced additional operations (checking conditions, maintaining the flag), we also introduced (compared to Dijkstra's version) two additional conditional jumps (for if and while), that will often but irregularly switch between taken and not taken. The latter is a problem on modern processors, since they rely on branch predictions for optimal performance and such branches will lead to a high number of mispredictions. (The branch corresponding to the outer loop is an opposite example: Always predicting that the loop will be entered is wrong only once.)

With these considerations in mind, we can try to optimize the code. The following should perform quite well:

while (i <= gt) {    int cmp = a[i].compareTo(v);    if (cmp > 0)     {       while ((cmp = a[gt].compareTo(v)) > 0) gt--;       exch(a, i, gt--);    }    if (cmp < 0) exch(a, lt++, i++);    else         i++; } 

This version makes as many comparisons as Dijkstra's version and as many swaps as your suggested version without introducing too much overhead: For each comparison we have one branch based on cmp > 0 and possibly one based on cmp < 0. This is the same as for Dijkstra's version except that the role of the two comparisons is exchanged.


So, summing up / answering your final questions explicitly:

  • The additional swaps you point out can be avoided without introducing other costs. So, as you suspected, they degrade performance compared to the final version in this answer.
  • The version you suggest removes the unnecessary swaps, but introduces additional comparisons. So it will too perform worse than the final version in this answer.
  • As I pointed out, the realitve performance of the original and your version will depend on the input at hand and the relative costs of swaps and compares. (Since usually compares are faster than swaps, I'd expect your version to be faster on average.)

No comments

Powered by Blogger.