CS 112

CS 112 Summer 2, 2020 -- Homework Six

Due Friday, August 14th @ midnight with 24-hour grace period

The submission time for the entire assignment is the submission time of the last file submitted.

Introduction

You must observe the following requirements (for all homework submitted in this course):

All programs should be literate, i.e., easily understandable by a human (say, your grader) and follow the Java Style Guidelines for CS112 posted on the class web site;
You may not use data structure libraries such as ArrayList, since we are learning to write Java from the ground up and you must learn how these libraries are built; however, you are free to use (unless specifically directed otherwise) the basic libraries String, Character, Scanner, and Math;
You may freely use code from the class web site, the textbook, or lecture (unless specifically directed otherwise) as long as you cite the source in your comments; but you may NEVER use code from the web or other students' programs---this will be considered plagiarism and penalized accordingly.
There is only ONE problem for this homework, worth 100 points.

Problem: Hash Table Concordance -- 100 pts

For this problem, you will develop a concordance which gives a list of all words occurring in a text, along with the line number(s) where the word occurs. The input will be single String, with newline characters \n at the end of each line. Although a concordance for a book would give page numbers, in our program we will count line numbers for simplicity.

For example, if you wanted to build a concordance of our Haiku from HW 01, you would have a string "Arms folded\nTo the moon\nAmong the cows.\n") and the concordance produced by your code would be as follows:

among: 2
arms: 0                // note that line numbers start at 0
cows: 2
folded: 0
moon: 1
the: 1,2
to: 1

The interface for your program is here: Concordable.java and the template is here: Concordance.java. The template is empty except for the Unit Test at the end.

This is the last homework, so you will be expected to think through what is required here and to use appropriate resources from your previous homeworks, and from lectures and code examples.

The REQUIREMENTS are as follows:

You must pass all the unit tests;
Your code should be neat and readable (remove your debugging code unless it is well-designed and useable in the future);
You must use (a refactoring of) your solution to Index.java from HW 05 to put the entries in order when you print out the concordance (I will post my solution after the deadline has passed for HW 05);
As usual you may not use any Java libraries except as noted at the top of this document.

My SUGGESTIONS are as follows:

Do as much as possible (i.e., build the whole hash table from the string) in the constructor;
When inserting a new entry (word,lineNumber) into the hash table bucket, insert the word in order, but do NOT add a duplicate entry if the word occurs multiple times on the same line, but DO add entries for the same word on separate lines (the first test should make this clear), and add them in order of increasing line numbers;
Use step-wise refinement: Build the hash table first, test it to make sure it is working, then create the index, make sure it works, and then put them together, finally making sure everything is working properly and all tests are satisfied;
Use the industrial-strength hash function sfold for your table; the version given here will work fine if you change the type long to int;
Keep the buckets in the hash table ordered by the key/word and then by the line number; since you are going through the lines of the text in order, you can simply insert entries for the same word ONLY when you find a larger key/word (so new entries for the same word are added after earlier entries);
In order to print out the concordance, since the hash table itself is not in order, you must create a slightly different version of your Index.java program from the last homework, which will store the words and the line numbers; one difference is that you want to keep the line numbers in order, so when you insert in the values linked list, use the "insert in order" method we discussed several times.

Submission:

Files to submit:

Concordance.java
Index.java (modified suitably from the last version for hw 05)