CS-330: Algorithms

Sorting algorithms pseudo-code

These are my versions of the algorithms, which may be slightly different from the book's.


Input:     Array A[1...n], of elements in arbitrary order; array size n
Output:  Array A[1...n] of the same elements, but in the increasng order (more precisely in non-decreasing order - in case some elements are equal)


for i= 1 to n do
// find min element in A[i...n]
// and put it in the i'th position (i.e. at A[i])

min_index <-- i

//locate min
j= i+1 to n do

if A[j] < A[min_index] then min_index <-- j

//put the min where it belongs
swap( A[i], A[min_index] )


Input:     Array A[1...n], of elements in arbitrary order; array size n
Output:  Array A[1...n] of the same elements, but in the non-decreasing order


for i= 2 to n do
// insert i'th element A[i] into the already sorted A[1...i-1]

// keep swapping the value in A[i] with previous element
// till it "bubbles" down the right place
(i.e. keep swapping till the previos element is smaller)

j <-- i
j>1 AND A[j-1]>A[j] do // when i=1 there is no prev element

swap( A[j], A[j-1] )
j <-- j-1

Both algorithms above use swap() subroutine. Below is the pseudo-code for it. This particular version of swap uses no extra space, but uses the bit-wise XOR operation. Similar results could be obtained with numbers using + and - operators. Try it as an exercise. (And make sure you understand the version below too! ;-)

swap( A, B ):

Input:     two variables A and B, expressed in binary notation
Output:  The same variables, but with their values swapped


A <-- A XOR B
B <-- A XOR B
A <-- A XOR B


It is easier to view this algorithm as sorting separate sections of the array A. Thus the generic input of algorithms above is slightly modified by adding two parameters first and last.

Input:     Array A[1...n], of elements in arbitrary order; also first and last positions 1 < first < last < n
Output:  Array A[1...n] of the same elements, but the elements between first and last (i.e. A[first...last]) are in the non-decreasing order. All elements before first and after last are unchanged (and so the set of elements between first and last is not changed - only sorted).


if (last < first) then RETURN //there's nothing between first & last: 1 element is always sorted

middle <-- |_ (first+last)/2 _|
merge_sort( A, first, middle )
merge_sort( A, middle+1, last )
merge( A[first...middle], A[middle+1...last]; A[first...last] )


This algorithm takes two sorted arrays and combines them into one. The two input arrays are passed by value (i.e. the contents are implicitely copied by the algorithm before it starts).

Input:    Arrays L1[...] and L2[...], each with elements in non-decreasing order; the last element is "dummy", equal to infinity. These arrays are not modified by the algorithm, nor are affected by the changes made within the algorithm, while it runs;
               n= total number of non-dummy elements in L1, L2
Output:  Array A[1...n] of the same elements as in L1 and L2, but all the elements are in the non-decreasing order


// set two pointers at the begining of the array
p1 <-- 1 // pointer for L1
p2 <-- 1 // pointer for L2

for i=1 to n do
// the next - i'th - element of A will be the smaller of the two pointed by p1,p2

if ( L1[p1] < L2[p2 )

A[i] <-- L1[p1]


A[i] <-- L2[p2]


Input:     Array A[first...last], of elements in arbitrary order; first, last
Output:  Array A[first...last] of the same elements, but in the non-decreasing order.
(While the algorithm is recursive - similar to mergesort above - it can be done "in-place" - in contrast to the mergesort)


if first >= last then return;
// Pick a pivot element (value=x)
   // we pick a random element (this way the performance does not
   // depend on the input array - rather it depends only on
the algorithm's coin-tosses);
   // more sophisticated methods for choosing pivot exist

x <-- A[random(first, last)]

// Partition array into parts "<=x", x, ">x"
// assume
random(a,b) returns a random int between a and b

p <-- Partition( A[first...last], x )
        // A[first...p] <=x; A[p]=x
        // A[p+1...last] <=x;

QuickSort( A, first, p-1 )

QuickSort( A, p+1, last)

// Recursively sort the "<=x" and ">x" parts


The following is from Prof. Levin's notes (with my minor changes):

Las-Vegas algorithms, unlike Monte-Carlo, never give wrong answers. Unlucky coin-flips just make them run longer than expected. Quick-Sort is a simple example. It is about as fast as deterministic sorters, but is popular due to its simplicity. It sorts an array a[1..n] of n >2 numbers by choosing in it a random pivot, splitting the remaining array in two by comparing with the pivot, and calling itself recursively on each half.

For easy reference, replace the array entries with their positions 1,...,n in the sorted output (no effect on the algorithm). Denote t(i) the (random) time i is chosen as a pivot. Then i will ever be compared with j iff either t(i) or t(j) is the smallest among t(i),...,t(j). This has 2 out of |j-i|+1 chances. So, the expected number of comparisons is equal to the sum of 2/(1+j-i) for all i,j>i. This is essentially the formula right after (7.3) in the book (pg 158). This formula can eventually be simplified to 2n ln n -O(n) = O(nlgn) with very small constants hidden under O.
Note, that the expectation of the sum of variables is the sum of their expectations (not true, say, for product) - this is used in the derivation.

The above is essentially the same analysis as in the book. The analysis I have done in class is similar but still somewhat different.

Las-Vegas and Monte-Carlo are two types of probabilistic algorithms. We are probably not going to cover them here, but can be understood from the above, the coin-tosses in a Monte-Carlo algorithm can lead to an "error", while in Las-Vegas algorithms they may result only in longer running time. Selecting a random element as a pivot makes QuickSort a Las-Vegas type algorithm. If we pick always the first element as a pivot, then it is a deterministic algorithm, but its performance depends on the distribution of inputs. This illustrates a possibility of trading distribution of inputs and distribution of "internal coin-flips". If you assume that your inputs are determined by an adversary (or are otherwise subject to Murphy Laws to a greater extent than the internal coin-tosses), then you are better off with Las Vegas type algorithms.


First a new data structure: priority queue (PQ). A PQ takes input values and stores them, and when required returns (and removes) the minimum value stored.
More formally, PQ has two methods (some more might be added later): insert(x) and remove_min(). The first inserts x into the PQ, and the second removes the min element which was inserted and not yet removed by previous remove_min operations. Of course, we need a constructor method init which initializes an empty priority queue.

Now, assuming we have such a data structure (independently of how it was implemented), we can write a PQ-sort algorithm:

Input:     Array A[1..n], of elements in arbitrary order; n
Output:  Array A[1..n] of the same elements, but in the non-decreasing order.



// insert all elements into PQ
for i = 1 to n

for i = 1 to n
// remove all elements from PQ in order
A[i] <-- PQ.remove_min()

The above algorithm clearly takes 2n+1 PQ operations. So, if each PQ operation takes O(lg n) steps (as will be the case), then this algorithm sorts in O(n lg n) steps.

PQ implementation: Heap

Heap="nearly balanced and almost complete binary tree"
Balanced property: all the leaves are at either depth d or d+1, and those which are at depth d+1 are to the left of those at depth d.

Heap property: x >= parent(x).

Heap property guarantees PQ: the min element is always at the root.
Balanced property will guarantee that the PQ operations are efficient.

Insert(x). A new element can be added only at one place: as a child of the left most "incomplete" node at depth d (such a node is either a leaf or has only one child). Adding such an element violates heap property. To restore it the new element must "bubble up" along the path to the root, till heap property is restored.

Remove_min(). The min element is easy to find - it is at the root. But if it is removed then the tree "falls apart". So, when it is removed, it is replaced by another element - taken from the right-most position at depth d+1, where it will not be missed. Of course placing it at the root will again violate heap property. So, this element is "bubbled down". This time the bubbling is a bit more delicate - the smaller of the two children must be picked to swap with.


Heap representation: map the heap into array, by listing the element layer by layer, from the top down.

HeapSort is essentially the above PQ-sort with PQ implemented as a heap, plus a number of optimizations. See book for details and code.