CS 112 Summer 2, 2020 -- Homework Six

Due Friday, August 14th @ midnight with 24-hour grace period

The submission time for the entire assignment is the submission time of the last file submitted.

Introduction

You must observe the following requirements (for all homework submitted in this course):

Problem: Hash Table Concordance -- 100 pts

For this problem, you will develop a concordance which gives a list of all words occurring in a text, along with the line number(s) where the word occurs. The input will be single String, with newline characters \n at the end of each line. Although a concordance for a book would give page numbers, in our program we will count line numbers for simplicity.

For example, if you wanted to build a concordance of our Haiku from HW 01, you would have a string "Arms folded\nTo the moon\nAmong the cows.\n") and the concordance produced by your code would be as follows:

among: 2
arms: 0                // note that line numbers start at 0
cows: 2
folded: 0
moon: 1
the: 1,2
to: 1

The interface for your program is here: Concordable.java and the template is here: Concordance.java. The template is empty except for the Unit Test at the end.

This is the last homework, so you will be expected to think through what is required here and to use appropriate resources from your previous homeworks, and from lectures and code examples.

The REQUIREMENTS are as follows:

  1. You must pass all the unit tests;
  2. Your code should be neat and readable (remove your debugging code unless it is well-designed and useable in the future);
  3. You must use (a refactoring of) your solution to Index.java from HW 05 to put the entries in order when you print out the concordance (I will post my solution after the deadline has passed for HW 05);
  4. As usual you may not use any Java libraries except as noted at the top of this document.

My SUGGESTIONS are as follows:

  1. Do as much as possible (i.e., build the whole hash table from the string) in the constructor;
  2. When inserting a new entry (word,lineNumber) into the hash table bucket, insert the word in order, but do NOT add a duplicate entry if the word occurs multiple times on the same line, but DO add entries for the same word on separate lines (the first test should make this clear), and add them in order of increasing line numbers;
  3. Use step-wise refinement: Build the hash table first, test it to make sure it is working, then create the index, make sure it works, and then put them together, finally making sure everything is working properly and all tests are satisfied;
  4. Use the industrial-strength hash function sfold for your table; the version given here will work fine if you change the type long to int;
  5. Keep the buckets in the hash table ordered by the key/word and then by the line number; since you are going through the lines of the text in order, you can simply insert entries for the same word ONLY when you find a larger key/word (so new entries for the same word are added after earlier entries);
  6. In order to print out the concordance, since the hash table itself is not in order, you must create a slightly different version of your Index.java program from the last homework, which will store the words and the line numbers; one difference is that you want to keep the line numbers in order, so when you insert in the values linked list, use the "insert in order" method we discussed several times.

Submission:

Files to submit: