Notes on Dynamic Data Structures

Wayne Snyder

CS 341 -- Fall -- 2007

The following notes present the basic algorithms and techniques used in linked lists, including recursion. In particular, the topics we cover are as follows:

  1. Basic (iterative) algorithms on singly-linked lists;
  2. Basic algorithms on doubly-linked lists;
  3. An introduction to recursion; and
  4. Recursive algorithms on SLLs.

These notes are intended to back up your own lecture notes. They are not intended to be a complete coverage of the subject, such as it found in the many textbooks.

 

Iterative Algorithms on Linked Lists

Basic data declarations

All the code below assumes the following declaration:

   const int ERRORITEM = -9999; // Or some other sentinel value....

   class ListNode {
     public:
       int info;
       ListNode * next;
       ListNode(int, ListNode*); 
   };


We assume that to use these algorithms, the following initialization has been done:
   ListNode * head = NULL;   

The constructor for this data type is as follows.

   ListNode::ListNode(int n, ListNode * p) {
      info = n;
      next = p;
   }

This function will simplify a number of the algorithms below. For example, to add a node to the front of an existing list, we could simply write

        list = new ListNode(n, list);
and if we have a pointer p to a node in a list, we can add a new node after p (i.e., between the node *p and the node *(p->next)) by writing:
        p->next = new ListNode(n, p->next);
Note that it can also be used to create simple linked lists in a single C++ statement. For example, to create a list containing the integers 7, 8, and 12, we could write:
 
   list = new ListNode( 7, new ListNode( 8, new ListNode( 12, NULL) ) )

Basic Paradigms

Chaining down the list

The basic thing you do with a list is to "chain along" the list, performing some operation at each node (such as counting the number of nodes, or looking for a particular node). This is done with a simple for loop that initializes a pointer to the first node and stops when it reaches NULL. Note that this loop will execute once for each node in the list, pointing p at each node in turn:

        for(ListNode * p = head; p != NULL; p = p->next ) {
            // Do something at each node in the list    
        }

Chaining down the list and stopping at the end or upon some condition

Suppose we wish to find the first node in the list satisfying some condition Cond (e.g., we are looking for a particular item). A simple modification of the previous technique is to add another condition to the loop, so that when the loop ends, p is pointing to either NULL (if no node satisfies the condition), or p is pointing to the first node satisfying the condition:
        for(ListNode * p = head; p != NULL && !Cond; p = p->next )
            ;

Note that there is no loop body, as the entire purpose of the loop is to find the node satisfying the condition; also note carefully that you must put the negation of the condition Cond in the loop condition, since when the condition Cond is NOT true we keep looping. Thus, when the for loop stops, we are guaranteed that either (i) p is NULL or (ii) Cond is true. Generally, you must check which of these cases happened.

Also, note that !Cond usually involves an examination of some member of the node pointed to by p, e.g., !Cond is "p->info !=N". Whenever we follow a pointer into a class (using "->") we must ALWAYS check first if p is NULL, or else risk a segmentation fault. However, since && is "lazy" it does not evaluate the second half if the first half is false, and so !Cond can refer to the next or info field inside the node pointed to by p.

Chaining down the list, looking ahead

Another possibility is to chain along, but perform some check on the next node in the list; for example, when inserting a number N into a sorted list, you need to check if the next node is bigger than N (if so, you insert N after the current element).

        for(ListNode * p = head; p != NULL; p = p->next ) 
           if ( p->next != NULL && p->info has some property ) {
                 // Do something
           }
Note that here, the check for NULL before following a pointer gets a little messy; the next technique does the same thing, but is usually preferable.

Chaining down the list with a trailing pointer

A final paradigm is to chain along, but keep a pointer q trailing p, so that you always have a pointer to the previous element (except for the first time through the loop, when q has no value); this is an alternative to the previous look-ahead technique that solves a basic problem in SLLs: when you chain along, you lose the pointer to the previous node:

        ListNode * q; 
        for(ListNode * p = head; p != NULL; q = p, p = p->next ) {
           // During first iteration, p points to first element and q has no value;
           // thereafter, p points to a node and q points to the previous node
        }
We will use these techniques in the iterative algorithms below.

Algorithms for SLLS

Print the List

This algorithm is a good place to start, as it uses the first chaining-along technique very simply:

     print() {
        for(ListNode * p = head; p != NULL; p = p->next ) 
           cout << p->info << endl;
     }

(We will leave off the "List::" syntax for simplicity, assuming that these functions are part of the List abstract data type that we have been discussing.)

Finding the length of a list

Another simple application of chaining along:

    int length( ) {
      int i=0;
      for( ListNode * p = head; p != NULL; p = p->next )
         ++i;          
      return( i );
    }

Returning the Nth data item in a list

We can modify the previous algorithm somewhat to create a function that returns the Nth data item or the error sentinel if the list has no Nth item:

    int getItem( int posItem ) {
      int pos = 0;  
      if ( posItem < 0)
         return ERRORITEM;
      for( ListNode * p = head; p != NULL && pos < posItem; p = p->next)
         ++pos;
      // Ran off end (p == NULL) or p points to item (pos == posItem) 
      if( p == NULL )
         return ERRORITEM;
      else // pos == posItem 
         return p->info;
    }

Note carefully that when a loop with a double condition like this exits, the negation of one of the conditions joined by && must be hold. Thus, when the loop exits, either p is NULL or pos == posItem.

Looking up an item

This next function is a standard one that uses the double-condition for loop technique; it returns a pointer to the node containing N if it exists, and NULL otherwise:

    int lookup( int N ) {
       for( ListNode * p = head; p != NULL && p->info != N; p = p->next )
          ;
       return p; 
    }

Deleting an item in a list

We next show how to delete a node in a list, two different ways; the first uses the lookahead technique:

    deleteNode( int N ) {
       ListNode * q; 
       if ( head != NULL && head->info == N ) {
          q = head;
          head = head->next;
          delete q; 
          return;
       }
       for( ListNode * p = head; p != NULL; p = p->next ) 
          if( p->next != NULL && p->next->info == N ) {
              q = p->next;
              p->next = p->next->next;
              delete p; 
              return; 
          }
     }

The algorithm can also be written, somewhat more compactly, using the two pointer technique:

    deleteNode( int N ) {
       ListNode * q; 
       if ( head != NULL && head->info == N ) {
          q = head;
          head = head->next;
          delete p; 
          return;
       }
       for( ListNode * p = head; p != NULL ; q = p, p = p->next ) 
          if( p->info == N ) {
              q->next = p->next;
              delete p; 
              return;   
          }
     }

We shall see that the recursive version of this algorithm is even simpler!

Inserting an item into a sorted list

We have shown above how easy it is to insert an item into the first position using the ListNode() constructor; inserting into a sorted list is quite similar to the delete algorithm and again uses the two-pointer technique. Unfortunately, there are three cases to consider (two involving the head pointer) and this makes it a little messy. We shall combine two of the cases into one if statement, relying on the fact that the second half of an OR condition is evaluated only if the first half is false:

    insertSorted( int N ) {
       ListNode * q; 
       if ( head == NULL || N < head->info ) 
          head = ListNode( N, head );
       else {
          for( ListNode * p = head; p != NULL && N < p->info; q = p, p = p->next ) 
             ;
          q->next = ListNode( N, p );
       }
     }

Again, the recursive version of this algorithm is much simpler still!

Removing the last item in the list

The difficulty here is that we need to have a pointer to the next-to-last node, if it exists; this gives us three cases, for lists of length 0, 1, and more than 1. The iterative version is pretty messy:
deleteLast( ){

  if ( head == NULL )  // Empty list
    ;
  else if (head->next == NULL) { // Only one Node in list, 
    delete head;                    //  so remove it
    head = NULL;
  }
  else {   // At least two Nodes in list
    ListNode * q;
    for ( ListNode * p = head; p->next != NULL; q=p, p = p->next ) 
      ;
    q->next = NULL;
    delete p; 
  }
}

 

Reversing a list

This algorithm is probably one of the most difficult for linked lists; it uses three pointers which chain down the list together and rearrange the next pointers so that they point to the precessor instead of the successor in the list.

     reverse( ) {
        ListNode *  pt1 = head, * pt2 = NULL, * pt3;
        while ( pt1 != NULL ) {
           pt3 = pt2;         // pt3 follows pt2
           pt2 = pt1;         // pt2 follows pt1
           pt1 = pt1->next;   // pt1 moves to next node
           pt2->next = pt3;   // link pt2 to preceding node
        }
        list = pt2; 
     }


Algorithms for Doubly-Linked Lists

Following are a few algorithms for manipulating doubly-linked lists. The global declarations for this list would be as follows:

   
   class DNode {
       int data;
       DNode * llink;
       DNode * rlink;
   };

   DNode * head;


  // Constructor will insert fields into node
    DNode::DNode(int n, DNode * p, DNode * q ) {
      data = n;
      llink = p;
      rlink = q;
    }      
   

Adding an element to the front

This one is straightforward:

 addFront( int n ) {

   head = new DNode( n, NULL, NULL );  // Note: llink of first node is NULL

}

Deleting an element from the front

Deleting is almost as simple, except we have the special case of an empty list.

 deleteFirst(  ){
  DNode * p;

  // If list is empty, do nothing, otherwise....
  if( head != NULL ) {
    p = head;          // Save ptr to first so can delete it
    head = head->rlink; // Reroute head ptr to second
    if (head != NULL )
       head->rlink->llink = NULL;
    delete p;    
  }
}

Deleting an element from the end

This is similar to the case for singly-linked lists, except for the extra pointer.

  deleteLast(  ){
  DNode * p; // Will point to last DNode

  if ( head == NULL )  // Empty list
    ;
  else if (head->rlink == NULL) { // Only one DNode in list, 
    delete head;                    //  so remove it
    head = NULL;
  }
  else {   // At least two DNodes in list, find last element
    for ( p = head; p->rlink != NULL; p = p->rlink )
      ;
    // Now p points to last DNode
    p->llink->rlink = NULL;
    delete p; 
  }

}

Deleting an item from the list

When deleting an arbitrary item, we need only one pointer (instead of two, one trailing the other, in singly-linked lists), but we still need to worry about the special case of an empty list and deleting the first element in the list. For the latter, we'll save a little time by calling an already-defined function:

      deleteItem( int n ) {
        DNode * p; 
        if ( head == NULL )
           ;
        else if ( head->data == n )   // First element
           deleteFirst(); 
        else {  // Find element, if it exists
           for( p=head; p != NULL && p->data != n; p=p->rlink ) // Traverse whole list
              ;
           // Now p pts to n or to NULL
           if ( p != NULL ) { // p pts to node with n
              p->llink->rlink = p->rlink;
              p->rlink->llink = p->llink;  // Reroute around p
              delete p; 
           }
        }
     }

We could rewrite all the algorithms for singly-linked lists for doubly-linked lists; this is fairly mechanical, so we do not list them all here. Just remember to reconnect all the pointers properly. Drawing diagrams helps; if you can't draw the algorithm using a diagram on paper, you won't be able to do it in C++.

Recursive Algorithms

Recursively defined algorithms are a central part of any advanced programming course and occur in almost every aspect of computer science. Although they are difficult to understand initially, after one gets the knack, they are easier to write, debug, and understand than their iterative counterparts. In many cases, the only realistic solution possible for a certain problem is recursive.

Let us examine the definition of the factorial function. We can define the factorial of a number n, notated n!, in two ways:

  1. n! is the product of all the integers between 1 and n, inclusive;
  2. if n = 1, then n! = 1, otherwise n! = n * ( n-1)!
The first definition gives us an explicit way to calculate n! which involves iterating through all the numbers from 1 to n and keeping a running sum; it could be expressed in C++ as follows:

   int factorial( int num ) {
      int fact = 1;
   
      for (i = 1; i <= num; ++i)
         fact = fact * i;

      return(fact);
   }

The second definition of n! is, at first glance, nonsense, because we are defining something in terms of itself. Its like asking someone what the food at a Thai restaurant is like and he tells you, ``Well, it's kind of like food from Thailand." Or you look up ``penultimate" in the dictionary and it says ``just after propenultimate;" but when you look up ``propenultimate" it's defined as ``just before penultimate." Actually our example is not exactly this paradoxical, because we are defining our object, if you look closely, in terms of a slightly different object. That is, n! is defined in terms of ( n-1)!, which has a smaller value before the ``!". Also, the definition has a condition: when the value of n is small enough, i.e., 1, the factorial is just given explicitly as 1. Since the recursive part always defines the factorial in terms of the factorial of a smaller number, we must reach 1 eventually. This is the trick which allows us to define mathematical objects in this way. We must define a mathematical function explicitly for some values, and then we can define other values in terms of the function itself, as long as the function will eventually reach one of the explicit values. Let us look at the C++ for this version of the function:

   int factorial( int num ) {
      if ( num == 1 )
         return 1;
      else return n * factorial( num - 1 );
   }

This function has the following standard features of any recursively defined procedure or function:
  1. it has an if or a switch statement;
  2. this if statement tests whether the function input is one of the base cases, i.e., one for which a value is returned explicitly;
  3. if the base case is found, an action is performed or a value is returned which does not involved calling the function again;
  4. if the base case is not found, the function calls itself on an argument which is closer to the base case than the original argument.
When this program runs, the computer has to keep calling this function on increasingly smaller values of n until n equals 1. For example, to find the value of Factorial(4), the computer has to find out the value of Factorial(3); to find this value, it has to know the value of Factorial(2); to get this value, it has to know the value of Factorial(1). But it knows the value of Factorial(1), since we told it that this is 1. Now it can find the value of Factorial(2), etc. all the way back to Factorial(4). It is important to realize that the computation of the C++ function for a given value has to wait until it gets though with all the function calls it makes, even when it calls itself. Thus there will be many different invocations for the same piece of code even though only one of these will actually be executing; the rest will be waiting for the function calls they made to finish. You should try tracing the factorial program above for, say, Factorial(5), to get a feel for the way it works.

Let's look at another simple recursive algorithm similar to the factorial function:


   int power( int num, int exponent ) {
      if ( exponent == 1 )
         return num;
      else return num * power( num, exponent - 1 );
   }


Here we are determining the value of an integer num raised to a power exponent. We could have written this explicitly by just creating a for loop to multiply num by itself exponent number of times, i.e., 5^4 = 5 * 5 * 5 * 5. But the recursive algorithm says that, for example, 5^4\ is just 5 * (5^3), which is just 5 * (5 * (5^2) ), which is just 5 * (5 * (5 * (5^1) ) ), which is just 5 * 5 * 5 * 5. So they really do the same thing in different ways. Note again that the recursive call involves the function calling itself on arguments which get closer to the base case--if you keep subtracting 1 from exponent you will eventually reach 1. Try this algorithm on Power(2, 5).

Another recursive algorithm we could write would be for calculating the Nth Fibonacci number. Recall that the Fibonacci numbers form a series in which the first two values are both 1, and each successive value is the sum of the previous two values:

     1 1 2 3 5 8 13 21 34 55 89 ....
Thus the third Fibonacci number is 2, the seventh is 13, and so on. Note how the definition is phrased: ``the first two values are both 1" (an explicit answer is given), ``and each successive value is the sum of the previous values"(the rest are defined in terms of previous values in the series). This is clearly translatable into a recursive algorithm with almost no effort:

   int  fibonacci( int n ) {
      if ( n < 2 )
         return 1;
      else return fibonacci(n-2) + fibonacci(n-1);
   }

This is obviously a C++ version of the English definition above, but will it work? After all, it calls itself not once but twice! The base case assures us, however, that this must stop eventually, since we call the function on smaller values each time. It must reach Fibonacci(1) or Fibonacci(2) eventually. In fact, this is not a very efficient way to calculate the Fibonacci numbers, since we must cover the same ground twice to get each number. It does work, however, and is an exact translation of the English definition we started with. In other words, it is a more natural expression of the original problem than an iterative algorithm, because the original definition is recursive. Again, try this on some small values to convince yourself that it works.

A slightly more difficult algorithm which can be written recursively is Euclid's Greatest Common Divisor algorithm, which has pride of place as the oldest recursive algorithm in existence. This ancient Greek mathematician discovered that we can find the largest integer which divides two given integers evenly if we generate a series of values a follows:

  1. write down the two integers;
  2. divide the first by the second and write down the remainder from that division;
  3. if the remainder is 0, then the greatest common divisor is the number immediately to the left of the 0, i.e., the number you divided into the previous number to get a remainder of 0;
  4. if the remainder is not 0, then repeat from step 2 using the last two integers in the list.
For example, starting with the two integers 28 and 18, we would generate the series:
     28  18  10  8  2  0.
Thus 2 is the greatest common divisor of 28 and 18. This method is essentially a recursive algorithm, although it may not be obvious at first. Notice that we perform the same action on each pair of numbers: we divide the first by the second and write down the remainder, then continue with the second number and the remainder just obtained, etc., until we reach 0. The recursive algorithm looks like this:

   int gcd( int num1, int num2 ) {
      {
         if ( (num1 % num2) == 0 )
            then gcd = num2;
            else gcd = gcd( num2, (num1 % num2) );
      }

This algorithm will implicitly create the list (except for the 0, which just indicates that the previous number divides the number before it evenly) that we showed above if you call gcd(28, 18). It's tricky, but it does nothing really different than the recursive algorithms we have examined so far. It's worth tracing through on some simple input.

We have presented here a number of recursive functions returning values, but it is important to realize that void functions (not returning values) can be written recursively as well. For example, many sorting algorithms are written as recursively.

Other recursive algorithms are presented below for singly-linked lists and for tree structures. These are important algorithms which show the advantages of recursion clearly. For some, like the printing procedures for singly-linked lists, the iterative and recursive versions are of about equal complexity. For others, like the cell deletion algorithm, the difference is more pronounced in favor of the recursive version. For another class of algorithms, such as the tree walk and insertion procedures, the recursive version is really the only reasonable solution. In general, when a data structure is defined recursively(like a tree or a linked list) the most natural algorithms are recursive as well. To use these advanced data structures one must have a firm understanding of recursion. Those interested in pursuing this topic should try writing the recursive algorithms suggested in the notes on singly-linked lists below and should look up other recursive algorithms, such as Quicksort, Mergesort, the Tower of Hanoi, or practically any algorithm which involves trees.

 

Recursive Algorithms on Linked Lists

The recursive algorithms depend on a series of subroutine calls to chain along the list, rather than an explicit for loop. In combination with reference parameters, the recursive versions of most linked-list algorithms are quite concise and elegant, compared with their iterative counterparts. The use of reference parameters allows us to avoid the explicit case for the head of the list. Note that in recursive list algorithms, we must call the function using the head pointer, and have a list pointer as a parameter.

Print the List

This is a simple algorithm, and good place to start. Recursion allows us flexibility in printing out a list forwards or in reverse (by exchanging the order of the recursive call):

      print( ListNode* p ) {
       if (p != NULL) {
           cout << p->info << endl;
           print( p->next ); 
       }
     }

      printReverse( ListNode* p ) {
       if (p != NULL) {
           print( p->next ); 
           cout << p->info << endl;
       }
     }

     // Example of use:

     print( head );

Finding the length of a list

Another simple recursive function.

    int length( ListNode* p ) {
       if (p == NULL) {
          return 0
       else
          return 1 + length( p->next ); 
       }
     }

Disposing of a list

This is an example of an algorithm which is awkward to do in the iterative case, but exceedingly simple to do in the recursive case:

      deleteList( ListNode * p ) {
       if( p != NULL ) {
         deleteList( p->next ); 
         delete p; 
       }
     }

Inserting an item into a sorted list

     insertInOrder( int item, ListNode* &p ) {
       if( p == NULL !! p->info >= item ) {
          p = new ListNode( item, p ); 
       else 
          insertInOrder( item, p->next ); 
    }     

    // Example of use:
    insertInOrder( 7, head ); 

Deleting an item from a list

This algorithm deletes the first occurrence of an item from a list. It uses reference parameters to simplify the number of cases necessary in the iterative version. The addition of one more recursive call enables this algorithm to delete all occurrence of the item, by continuing to chain down the list after the item has been found.

     deleteItem( int item, ListNode* &p ) {
       if( p == NULL )
          ;
       else if ( p->info == item ) {
          ListNode * q = p;
          p = p->next;  // 
          delete q;
          // Include this next line if you want to delete all occurrence of item
          // deleteItem( int item, p->next )
       }
       else 
          deleteItem( item, p->next );
    }     

Deleting the last element in the list

This was a rather messy process in the iterative case; the use of reference parameters and recursion makes it much simpler:
     deleteLast( ListNode* &p ) {
       if( p == NULL )
          ;
       else if ( p->next == NULL ) {
          ListNode * q = p;
          p = p->next;  // 
          delete q;          
       }
       else 
          deleteLast( item, p->next );
    }     

Appending two lists

Appending two lists is a simple way of creating a single list from two. For this one, we'll only give the recursive algorithm, as we'll assume at this point that you are convinced that recursion is better than iteration. This function add the second list to the end of the first list:

     append( ListNode * &p, ListNode * q ) {
        if ( p == NULL) {
           p = q;
        else 
           append( p->next, q );
     }

Merging two lists

Here is a more complex function to combine two lists; it simply zips up two lists, taking a node from one, then from the other. The first list in the original call now points to the new list. In this algorithm we will use a different technique from that of reference parameters; in this case it is more natural to return a pointer to the new list through the result of the function call:

     ListNode * zip( ListNode * p, ListNode * q ) {
        if ( p == NULL) {
           return q;
        else if ( q == NULL) {
           return p;
        else {
           p->next = zip( q, p->next );
           return p;
     }

Example of call:
      head = zip( head, anotherlist );
If head points to 3 4 7 and anotherlist points to 2 5 6 8, then at the end of this call to zip, head will point to 3 2 4 5 7 6 8.

Merging two sorted lists

Here is another more complex function to combine two lists; this one merges nodes from two sorted lists, preserving their order:

     ListNode * merge( ListNode * p, ListNode * q ) {
        if ( p == NULL) {
           return q;
        else if ( q == NULL) {
           return p;
        else if (p->info < q->info) {
           p->next = zip( q, p->next );
           return p;
        }
        else {
           q->next = zip( p, q->next );
           return q;
        }
     }

Example of call:
      head = zip( head, anotherlist );
If head points to 3 4 7 and anotherlist points to 2 5 6 8, then at the end of this call to zip, head will point to 2 3 4 5 6 8.

 

Other linked-list algorithms to try.....

Some other recursive algorithms(in increasing order of difficulty) you might want to try writing along the lines of those above are: