In these questions we will first explore the binary heap data structure, and then consider the general problem of hashing by considering the two basic paradigms, using a simple hash function. In the first one, involving separate chaining, we will make this realistic by stored (key, value) pairs; the hash function is applied to the key alone, but the entire pair is stored in the node. For the second, involving linear probing, we will simply store the key in the array slot.
Write your solutions in a file hw07.txt with your name, the homework number, and the date.
Insert the following sequence of keys into an intially empty 2-3 tree. You should write all the trees down as you do the transformations (this is what would be expected on an exam) but for the purposes of grading this exercise you can just draw the final tree that results.
15, 25, 20, 10, 12, 4, 19, 6
Solution: The tree created (written sideways) would be:
25 20 (15|19) 12 10 6 4
Consider the following 2-3 tree:
Suppose we count ONLY comparisons between two keys, and NOT checks to see if the key exists, or checks to see if a pointer is null. (i) How many comparisons would necessary to find the key 10? (ii) How about 140? (iii) How about 60? (iv) Which key(s) occurring at leaf nodes would require a minimum number of comparisons? State which and how many comparisons. (v) Which key(s) would require a maximum number of comparisons? State which and how many comparisons. (vi) What is the average number of comparisons to find the keys in this tree (count for all and then divide by the size of the tree).
Solution: (i) 3 (ii) 6 (iii) 4 (iv) the minimum number of comparisons occurs with 10 and 30, which each require 3 comparisons (v) 140 requires 6 comparisons (vi) summing in level order: (1+2+2+3+3+4+3+3+4+4+4+4+5+5+6+5)/16 = 58/16 = 3.625.
Let us consider using a hash table for a symbol table storing (key, value) pairs, where the key is used as input to the hash function. Insert the following key-value pairs into a hash table implemented with the technique of separate chaining with an array H of size 7. For each insertion, use the hash function hash(x) = (3 * x % 7) on the key part and store the pair in the bucket at H[ hash(key )]. Assume nodes are defined as
class Node{ int key; String value; Node next; }
.
insert(4, "apples");
insert(2, "bread");
insert(1, "eggs");
insert(38, "diapers");
insert(15, "beer");
(i) Show the hash table after all insertions are performed. (ii) What is the worst-case lookup time (in the number of comparisons), and (iii) what is the average-case lookup (average the number of comparisons over all keys in the table)?
(i) A[0] -> . A[1] -> . A[2] -> [38,"diapers"] -> . A[3] -> [1,"eggs"] -> [15,"beer"] -> . A[4] -> . A[5] -> [4,"apples"] -> . A[6] -> [2,"bread"] -> . (ii) The worst case would be 2 comparisons (to look up 15) and (iii) the average case is (1+1+2+1+1)/5 = 6/5 = 1.2 comparisons
insert(41, "pears");
insert( 25, "beef" );
delete( 1 ); // you only need to specify the key to delete
delete(36);
insert( 63, "sugar" );
delete( 2 );
insert( 21, "chicken" );
delete( 76 ); // you only need to specify the key to delete
delete(15);
insert( 29, "flour" );
(i) What is the worst-case lookup and (ii) what is the average-case lookup for keys in the table that results?
(i) A[0] -> [63,"sugar"] -> [21,"chicken"] -> . A[1] -> . A[2] -> [38,"diapers"] -> . A[3] -> [29,"flour"] -> . A[4] -> [41,"pears"] -> . A[5] -> [4,"apples"] -> [25,"beef"] -> . A[6] -> . (ii) Worst case is 2 comparisons (for keys 21 or 25) and (iii) average case is (1+2+1+1+1+1+2)/7 = 9/7 = 1.286
There are many possibilities, for example, any keys divisible by 7 will all hash to location 0, e.g., 0, 7, 14, 21, etc.
The worst case is linear, i.e., O( M ).
O( M / N ). // Note that M and N are reversed compared to what I did in lecture. // No significance to this, just notation.
insert(4); insert(2); insert(1); insert(28);
(i) Show L after all insertions were performed. (ii) What is the worst-case time to search for one of the four keys 4, 2, 1, or 28?
(i) (blank indicates 0 = never used) A[0]: 0 A[1]: 4 A[2]: 0 A[3]: 1 A[4]: 0 A[5]: 0 A[6]: 2 A[7]: 28 A[8]: 0 A[9]: 0 A[10]: 0 (ii) The worst-case time is 1 comparison, since there were no collisions when doing the insertions.
insert( 23);
insert( 63 );
insert( 19 );
insert(13);
(i) (blank indicates 0 = never used) We show the search for each key after original hash to insertion point: A[0]: 0
A[1]: 4 <= 4
A[2]: 63 <= 63 <= 19
A[3]: 1 <= 1 <= 23 <= 19
A[4]: 23 <= 23 <= 19
A[5]: 19 <= 19
A[6]: 2 <= 2 <= 13
A[7]: 28 <= 28 <= 13
A[8]: 13 <= 13
A[9]: 0
A[10]: 0
(ii) What is the worst-case time for search for one of the keys 4, 2, 1, 28, 23, 63, 19, or 13 in the table that results?
The worst-case time is for key 19, which takes 4 comparisons.
(iii) What is the average-case time in this table (total the number of comparisons used for searching for each key, and divide by the number of keys, i.e., 8)?
The lookup cost for each key is the number of comparisons done when inserting (e.g., 19 took 4, and 13 took 3, etc.); the total is 14, and the average case is 14/8 = 1.75 comparisons.
delete( 1 );
delete(23);
insert(5);
A[0]: 0 // this would be the value 0, indicating "never used"
A[1]: 4
A[2]: 63
A[3]: -1 // this would be the value -1, indicating "used but empty"
A[4]: 5 // this was -1 after deleting 23, but then 5 was inserted here
A[5]: 19
A[6]: 2
A[7]: 28
A[8]: 13
A[9]: 0
A[10]: 0