Binary Search Tree


  1. Abstract idea of a tree:

    A tree is another data structure that you can use to store pieces of information, or rather, a bunch of elements.

    Here, we'll consider elements that each have a key (that identifies the element) and a value (that is the data for an element), however, we'll ignore the value part for now.

    Here is an example of a tree whose keys are letters:

          tree
          ----
           j    <-- root
         /   \
        f      k  
      /   \      \
     a     h      z    <-- leaves
    

    Tree Vocabulary

    Let's now introduce some vocabulary with our sample tree... The element at the top of the tree is called the root. The elements that are directly under an element are called its children. The element directly above something is called its parent. For example, a is a child of f and f is the parent of a. Finally, elements with no children are called leaves.

    Is k a leaf? Who is its parent?

    How many parents can elements have?


    Aside: If you were to draw the picture above upside down, it would look like a real tree, with the leaves at the top and the root at the bottom...However, we usually draw tree data structures as we've done above.

    Recursive Data Structure

    A tree can be viewed as a recursive data structure. Why? Remember that recursive means that something is defined in terms of itself. Here, this means that trees are made up of subtrees.

    For example, let's look at our tree of letters and examine the part starting at f and everything under it...

          tree
          ----
           j
         /   \
        f      k  
      /   \      \
     a     h      z
    

    Doesn't it look like a tree itself? In this subtree, what is f (recall our vocabulary)?

    What about just z? Doesn't it look like a subtree?

    Binary Trees

    We can talk about trees where the number of children that any element has is limited. In the tree above, no element has more than 2 children. For the rest of this example, we will enforce this to be the case.

    A tree whose elements have at most 2 children is called a binary tree.

    Since each element in a binary tree can have only 2 children, we typically name them the left and right child.

    Ordering of Tree?

    The structure of a tree is hierarchical, meaning that things are ordered above or below other things. For example, the army is hierarchical, with generals above colonels, and colonels above lieutenants, etc.

    Despite the hierarchical order of the structure of the tree, the order enforced on elements in the tree will depend on how we use the tree.

    Binary Search Tree

    The tree that we presented above actually has an enforced order.

    First, remember that the letters in the tree are keys for the elements held in the tree.

    For any element, keys to its left are less than its own key. Also, keys to its right are greater than its own key. E.g.:

          j 
    f<j       k>j
        /   \
      f       k
    

    Note that this is true for every element.


    Aside: What about the equality case, i.e., two elements with the same element? Does the fact that elements have keys suggest an answer?

    A tree with this ordering property AND that is binary is called a binary search tree. Why? Because in order to search for an element (with a specific key) in such a tree, you only need to make a series of binary (i.e., go left or right) decisions.

    For example, to find h starting from the tree's root...

          tree
          ----
           j    <-- root
         /   \
        f      k
      /   \      \
     a     h      z    <-- leaves
    

    1. Key h is less than j, so go left.
    2. Key h is greater than f, so go right.
    3. Found h!

    Moreover, searching in a binary search tree is easier than in an unordered tree since no backtracking is required.


    Note: Backtracking is needed when traversing all the elements in a tree.

  2. Tree Operations:

    Here are the operations that we will concern ourselves with for this binary search tree. You may need others for a particular use of the tree.


    We may want more operations depending on how we'll use the tree.

    Which of the operations need entire elements and which require only keys?

  3. Adding Algorithm (with order preservation):

    Let's consider an algorithm for adding an element to a binary search tree.

    Adding an element requires searching for the proper place to put the new element, so that the binary search order will be preserved.

    We've already seen that no backtracking is needed when searching the tree, so our algorithm can use iteration (looping)--it does not require recursion.

    Here is an outline of such an algorithm (that assumes there is at least one element in the tree):

    Add(tree, element) [iterative]

    while (not done)
      if (element's key < root's key)
        if (no left child of root)
          put new element left of root
          done!
        else
          tree = left subtree
      else if (element's key > root's key)
        if (no right child of root)
          put new element right of root
          done!
        else
          tree = right subtree
      else
        do whatever you do when key is already in tree
        done!
    

    When the algorithm begins, it is given the entire tree.

    As it continues to search, it works it's way to lower and lower subtrees.

  4. Tree implementation in C:

    We want to implement a binary search tree that has the above properties and operations in C. Although we could use an array to implement a tree, we'll use an implementation more akin to a linked list...

    Recall that our trees store elements with both a key and a value. For simplicity, assume the values are just integers (a count, e.g.). The types needed for elements are thus:

    typedef char treeKeyT;
    
    typedef int treeValueT;
    
    typedef struct {
      treeKeyT key;
      treeValueT value;
    } treeElementT;
    

    Now, how will these elements be stored in a tree?

    Like a linked list, elements will be stored in nodes. Furthermore, a node will have to keep track of its element's immediate children. How will a node keep track of the left and right child?

    Answer: Like a linked list, nodes will point to one another in the tree--each node will point to the left and right child's node.

    The type needed for a node is thus:

    typedef struct treeNodeTag {
      treeElementT element;
      struct treeNodeTag *left, *right;
    } treeNodeT;
    

    When there is no left or right child, of course, we'll make the corresponding pointers NULL.

    Now, let's return to our original tree, but view it as if it was made up of these C treeNodeTs...

             -----
             |j  |
             |5  |
             |---|
             | | |
             /---\
            v     v
        -----     -----
        |f  |     |k  |
        |30 |     |13 |
        |---|     |---|
        | | |     |0| |
        /---\     ----\
       v     v         v
    -----   -----      -----
    |a  |   |h  |      |z  |
    |100|   |50 |      |1  |
    |---|   |---|      |---|
    |0|0|   |0|0|      |0|0|
    -----   -----      -----
    

    While these nodes suffice to keep track of all the elements, what do we need to keep track of the whole tree?

    Answer: We need a pointer to the root of the tree!

    Special case: Empty Tree

    What about when the tree is empty. How do we represent an empty tree?

    Answer: There are no nodes, so the pointer to the root should be NULL.

  5. Organization of data types for a tree:

    We have already thought a little about the types needed for our tree. Let's now think about how to organize those types and use them with an ADT/CDT design.

    As usual, we'll put our tree data structure in its own module, creating the source files tree.h and tree.c.

    The types for a key, value and element should be part of the interface in tree.h.

    The type for a node is an implementation detail, and thus, goes in tree.c.

    Finally, we need something that holds all the information needed to keep track of the tree. We saw that this should be a pointer to the node at the root of the tree.

    Since this pointer has to do with the implementation of the tree, we put it in the concrete type, struct treeCDT:

    typedef struct treeCDT {
      treeNodeT *root;
    } treeCDT;
    

    which goes in tree.c.

    In the interface (tree.h), we must fill in what the abstract type is as follows:

    typedef struct treeCDT *treeADT;
    

    Finally, we have:

    tree.h                          tree.c
    ------				------
    				#include "tree.h"
    
    typedef char treeKeyT;		typedef struct treeNodeTag {
    				  treeElementT element;
    typedef int treeValueT;		  struct treeNodeTag *left,
    				                     *right;
    typedef struct {		} treeNodeT;		
      treeKeyT key;		  
      treeValueT value;		
    } treeElementT;		
    
    typedef struct treeCDT		typedef struct treeCDT {
    	*treeADT;		  treeNodeT *root;
    				} treeCDT;
    

  6. Tree functions:

    Now that we've finished the types, let's discuss the tree functions. Based on the tree operations we mentioned above, the tree functions we'll need are:

    For general tree operations:
    • TreeAdd()
    • TreeIsMember()
    • TreePrint()

    Because we are programming in C (setup/cleanup):
    • TreeCreate()
    • TreeDestroy()

    We've briefly discussed the types and functions needed. Before we discuss implementing any functions, take a look at the test program treetest.c to see how the types and functions (from the tree module) will be used.

    Let's now consider what is needed for 2 tree functions: TreeCreate() and TreeAdd().

    Go ahead and implement these functions!

  7. Other functions:

    If desired, you may write the other functions needed for the binary search tree.

    Fill in the prototypes for the rest of the tree functions:

    treeADT     TreeCreate(void);
    return-type TreeDestroy(parameters);
    return-type TreeIsMember(parameters);
    void        TreeAdd(treeADT tree,
                        treeElementT element);
    return-type TreePrint(parameters);
    ...
    

    For functions that refer to elements, which need a complete element and which only need take a key as parameter?

    After writing the prototypes, write definitions for each function.


    Note: Remember to place the prototypes in the interface (tree.h) and function definitions in the implementation (tree.c) of your tree module.

    Test your implementation with the sample program treetest.c that uses elements that are letter/count pairs. A Makefile is also available for the program.


BU CAS CS - Binary Search Tree
Copyright © 1993-2000 by Robert I. Pitts <rip at bu dot edu>. All Rights Reserved.