CS 112 Summer 2, 2017 -- Homework Seven

Due Tuesday, August 1st @ midnight

The submission time for the entire assignment is the submission time of the last file submitted.

Introduction

You must observe the following requirements (for all homework submitted in this course):

Part A: Analytical Problems (10 points)

In these questions we will first explore the binary heap data structure, and then consider the general problem of hashing by considering the two basic paradigms, using a simple hash function. In the first one, involving separate chaining, we will make this realistic by stored (key, value) pairs; the hash function is applied to the key alone, but the entire pair is stored in the node. For the second, involving linear probing, we will simply store the key in the array slot.

Write your solutions in a file hw07.txt with your name, the homework number, and the date.

  1. Insert the following sequence of keys into an intially empty 2-3 tree. You should write all the trees down as you do the transformations (this is what would be expected on an exam) but for the purposes of grading this exercise you can just draw the final tree that results.

               15, 25, 20, 10, 12, 4, 19, 6 
  2. Consider the following 2-3 tree:

    Suppose we count ONLY comparisons between two keys, and NOT checks to see if the key exists, or checks to see if a pointer is null. (i) How many comparisons would necessary to find the key 10? (ii) How about 140? (iii) How about 60? (iv) Which key(s) occurring at leaf nodes would require a minimum number of comparisons? State which and how many comparisons. (v) Which key(s) would require a maximum number of comparisons? State which and how many comparisons. (vi) What is the average number of comparisons to find the keys in this tree (count for all and then divide by the size of the tree).

  3. Let us consider using a hash table for a symbol table storing (key, value) pairs, where the key is used as input to the hash function. Insert the following key-value pairs into a hash table implemented with the technique of separate chaining with an array H of size 7. For each insertion, use the hash function hash(x) = (3 * x % 7) on the key part and store the pair in the bucket at H[ hash(key )]. Assume nodes are defined as

    class Node{ int key; String value; Node next; }.

  4.       insert(4, "apples");
    insert(2, "bread");
    insert(1, "eggs");
    insert(38, "diapers");
    insert(15, "beer");

    (i) Show the hash table after all insertions are performed. (ii) What is the worst-case lookup time (in the number of comparisons), and (iii) what is the average-case lookup (average the number of comparisons over all keys in the table)?

  5. Perform the following operations on the table from (3) and show the table that would result:
  6.       insert(41, "pears");
    insert( 25, "beef" );
    delete( 1 ); // you only need to specify the key to delete
    delete(36);
    insert( 63, "sugar" );
    delete( 2 );
    insert( 21, "chicken" );
    delete( 76 ); // you only need to specify the key to delete
    delete(15);
    insert( 29, "flour" );

    (i) What is the worst-case lookup and (ii) what is the average-case lookup for keys in the table that results?

  7. Give a sequence of keys (just the keys, not pairs) that would all hash to the same location for a table of size 7, creating a worst-case hash table (i.e., start with an empty table, and create a linked list worst-case table).
  8. If we insert M keys into a hash table with N buckets, creating a worst-case table, what is the worst case time for looking up a key? Express in terms of N and M, using O(...) notation.
  9. If we insert M keys into a hash table with N buckets, suppose the hash table arranges itself in the best possible way, what is the worst-case lookup? Express in terms of N and M, using O(...) notation.
  10. Suppose now we have a hash table L using the strategy of linear probing, which just stores integer keys (not key-value pairs). The size of the array is N = 11 and the hash function is hash(k) = 3*k%11. Consider the following series of insertions:
    insert(4);
    insert(2);
    insert(1);
    insert(28); 

    (i) Show L after all insertions were performed. (ii) What is the worst-case time to search for one of the four keys 4, 2, 1, or 28?

  11. Perform the following operations on the table from (8) and (i) show the result:
  12.       insert( 23);
    insert( 63 );
    insert( 19 );
    insert(13);

    (ii) What is the worst-case time for search for one of the keys 4, 2, 1, 28, 23, 63, 19, or 13 in the table that results?

    (iii) What is the average-case time in this table (total the number of comparisons used for searching for each key, and divide by the number of keys, i.e., 8)?

  13. Perform the following operations on the table from (9) and show the table that would result:
  14.       delete( 1 );
    delete(23);
    insert(5);

Part B: Programming Problems (90 points)

Problem B.1: Binary Search Tree Practice (25 points)

For this problem, you must develop the following methods which perform basic tests and operations on binary trees; the template TreeRecursion.java contains a unit test which you must uncomment one test at a time to test your implementation of these methods. The definitions of these kinds of trees are taken from the Wikipedia page on Binary Trees, which contains some more explanations and diagrams.

You may not use any loops in your implementation of these methods, so, yes, they must be recursive.

  // B.1.1:  return the number of nodes in the tree -- just to get you started, already solved in lecture!
  
  private static int size(Node t) {...}
   
  // B.1.2:  print out a string representation of a binary tree using parentheses, infix style (see unit test)
  
  private static String treeToString(Node r) {...}
    
  // B.1.3:  print out a string representation of a binary tree using parentheses, prefix style (see unit test)
  
  private static String treeToString2(Node r) {...}

  // B.1.4: Count the number of leaves in a binary tree

private static int numLeaves(Node r) {...} // B.1.5: make a copy of a binary tree private static Node copy(Node r) { // B.1.6: reverse the tree by exchanging left and right pointers (you will modify the original tree) public static Node reverse(Node r) {...} // B.1.7: return true if the binary tree satisfies the binary search tree property, false otherwise // Hint: write a helper method isBSTHelper(Node r, int lo, int hi) which checks if all items in // the tree are between the lower bound lo and the upper bound hi. If you do not know the // lower bound, use Integer.MIN_VALUE and if you do not know the upper bound use Integer.MAX_VALUE. public static boolean isBST(Node r) {...} // B.1.8: return true if r is a degenerate binary tree, ie., in which all nodes have 0 or 1 child; // a null tree is not degenerate, but a one-node tree is degenerate. public static boolean isDegenerate(Node r) {...} // B.1.9: A perfect binary tree is a perfect triangle; test by checking recursively that both subtrees of each node have same size, OR // find the height and the size and determine if these two quantities have exactly the relationship implied by a perfect tree. public static boolean isPerfect(Node r) {...} // B.1.10: Determine if the two binary trees are structurally identical // You MUST do this by recursion on the structure of the trees, and can not // for example just return treeTostring(r).equals(treeToString(s)).... public static boolean equals(Node r, Node s) {...} I STRONGLY suggest that you use step-wise refinement, developing one method at a time, and verifying that it passes its performance tests before moving onto the next method.

Problem B.2: Index Table and Inverted Index Table-- 55 pts

For this problem, you must develop a variation of a symbol table called an index table (or sometimes just "index"); this is exactly the same as a symbol table, except that it has a list of values associated with it instead of just a single value. Such a data structure is very useful in a wide variety of applications (see the short description here: PDF).

BST Representation of an Index

For this problem, you must, first of all, write an ADT Index.java which stores nodes as a binary search tree, holding String keys with a value which is a list of Strings, stored as a linked list. Thus, you would have two different kinds of nodes, e.g.,

  
  private  class Node {      // Nodes for linked lists of values
    String value;
    Node next;
  }
  
  private class TreeNode {   // Nodes for the binary tree
    String key;
    Node values;             // pointer to linked list of values
    TreeNode left;
    TreeNode right;
  }
    

Thus, for the index table used in the unit test, which records what five people ate at lunch, we might have the following structure:

The template for this program, containing the unit test, is here: Index.java. You must write the following interface methods for this program:

public void insert(String key) {...} 

      Insert a new TreeNode with the given key, and
      a null list of values; if the key is already in the
      tree, do nothing. 
     
  
public void insert(String key, String val) {...}

      If the key is already in the table, add
      val to the list of values; but if val is already
      in the list of values, do nothing; 
      If the key is not in the table, insert a new TreeNode
      and create a linked list for the values, containing
      a single node with val.      
    

public void insert(String key, String[] values) {...}

      The functionality of this is the same as the last,
      except that you must add all the values to the linked list of values;
      you should never insert duplicates into the LL. 


  public String getValues(String key) { ....  }

      Return a String representation of the values associated with
      the key; if the key is not in the table, return null; if
      the list of values is null, return "[]". 
  

  public void delete(String key) {

      Remove the key and all its values from the tree;
      if the key is not in the tree, do nothing.
    
  }

Verify that your code satisfies the unit tests, as usual

.

Inverted Index Tables

You can think of an index table as a representation of a table such as this:

  Pasta Salad Chicken Beef Coke Pepsi Milk
John
X
X
X
Paul
X
X
X
Ringo
X
X
X
George
X
X
X
Wayne

 

The keys are the row headings (the names), and the values are a selection of the column headings. However, it is often useful to be able to go the other way, and have keys which are the column headings, and values which are the row headings. This form of table could be used, for example, to see how popular various foods are (e.g., Pasta was chosen by George, Ringo, and Paul). This is called an Inverted Index Table. Other applications of inverted index tables can be read about here: PDF. Here is an inverted index table corresponding to the same data shown in the table and in the previous figure:

Note that Wayne, alas, does not occur in the inverted table because he was too star-struck to eat a thing at his lunch with the Fab Four.

In order to create an inverted index from an index table, you "simply" have to create another, empty index, then traverse your original tree, both the tree nodes and the linked lists, and insert the appropriate values into the new index: the values become the new keys, and the keys become the new values. This is an interesting application of tree traversal combined with linked list traversal.

You must write the following method as part of the Index class, which constructs an entirely new index based on the ideas just explained, and returns it. Thus, if your original table is called Orders, then you would create a new inverted index table corresponding to Orders by calling Orders.getInvertedIndex().

 
  
  // method to build inverted index
  
  public Index getInvertedIndex() {
    Index NewTable = new Index(); 

     ......

    return NewTable;   
  }
   

Verify that your code satisfies the unit tests, as usual

Problem B.3: A Practical Application of the Index ADT: The IMDB table (10 points)

For this problem, you will fill in the stub to read in values from a text file containing 4110 movies and their casts; the name of the movie is the key, and the names of the cast members are the values. You can see from the template IMDB.java and from the text files movies.txt and moviesTest.java what is required to do. The main complication is how to parse the input file, but this is explained in the code template, and you can examine the text files as well.

Here are the two text files you need for this problem: moviesTest.txt and movies.txt. You can right click on these links and select Download File.... to put them in your project directory, this is the directory of the project you set up for 112 homeworks, e.g., here is mine:

If you put the data file here, then readInFile(....) can find the files as in the main method of the template:

 Index I = new Index();
 readInFile("moviesTest.txt", I);

 // readInFile("movies.txt", I);

 

Verify that your program satisfies the unit tests as usual.

Submission:

Files to submit:

Make sure that all programs are properly commented, neat and readable, all debugging trace statements are turned off or removed, and verify that you correctly pass all the performance tests in the unit tests.