Problem Set 8

There is only one part to this problem set.
All problems are due by 11:59 p.m. on Sunday, April 13, 2025.

Preliminaries
Problems

Preliminaries

In your work on this assignment, make sure to abide by the collaboration policies of the course.

Don’t forget to use docstrings and to take whatever other steps are needed to create readable code.

If you have questions while working on this assignment, please come to office hours, post them on Piazza, or email cs111-staff@cs.bu.edu.

Make sure to submit your work on Gradescope, following the procedures found at the end of the assignment.

Problems

Creating the necessary folder

Create a subfolder called ps8 within your cs111 folder, and put all of the files for this assignment in that folder.

Problem 1: A `Date` class

50 points; individual-only

Some people have an extraordinary talent to compute (in their heads) the day of the week that any past date fell on. For example, if you tell them that you were born on October 21, 2000, they’ll be able to tell you that you were born on a Saturday!

In this problem, you will create a Date class, from which you will be able to create Date objects that represent a day, month, and year. You will add functionality to this class that will enable Date objects to find the day of the week to which they correspond.

Getting started
Begin by downloading the following file:
ps8pr1.py

Make sure to put the file in your ps8 folder. If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.

Open that file in Spyder, and you’ll see that we’ve given you the following methods to start:

The __repr__(self) method, which returns a string representation of a Date object. This method will be called when an object of type Date is printed. It can also be tested by simply evaluating an object from the console. This method formats the month, day, and year that represent a Date object into a string of the form 'mm/dd/yyyy' and returns it.
The is_leap_year(self) method, which returns True if the called object is in a leap year, and False otherwise. In other words, when we create a Date object and call its is_leap_year method, the method will return whether that specific Date object falls in a leap year.
The days_in_month(self) method, which returns the number of days in the called object’s month. In other words, when we create a Date object and call its days_in_month method, the method will return the number of days in that Date‘s month.
The copy(self) method, which returns a newly-constructed object of type Date with the same month, day, and year that the called object has. This allows us to create deep copies of Date objects.
The day_name(self) method, which returns a string indicating day of the week on which the called Date object falls: 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', or 'Sunday'.

Important: The provided day_name method won’t work correctly until you have implemented the methods described below.

Your tasks

Below you will add several new methods to the Date class. Be sure to thoroughly test your methods for all cases to ensure that they are correct. In addition, make sure to include a docstring for each method that you write.

Important

If you add test code to your ps8pr1.py file, please put it in one or more separate test functions, which you can then call to do the testing. Having test functions is not required. However, you should not have any test code in the global scope (i.e., outside of a function).

Implement the Date constructor – i.e., the __init__ method. We have given you the header for this method, and you should fill in the body so that it initializes the attributes of the Date object (month, day, and year) to the values that are passed in as parameters.

Don’t forget to use the keyword self when specifying an attribute. For example, when initializing the month attribute, you should write a statement that looks like this:
```
self.month = ...
```
Here is some code you can use to test your constructor:
```
>>> d1 = Date(4, 7, 2025)

>>> d1.month
result: 4

>>> d1.day
result: 7

>>> d1.year
result: 2025
```
Note that we don’t actually use the name __init__ to call the method. Rather, we call it by using the name of the class (Date).

Once you have implemented the constructor, read over the rest of the starter code that we’ve given you. Make sure that you understand how the various methods work.

Try the following interactions in the IPython console to experiment with the __repr__, is_leap_year, and days_in_month methods:

>>> d1 = Date(4, 7, 2025)

# An example of using the __repr__ method. Note that no quotes
# are displayed, even though the function returns a string.
>>> d1
result: 04/07/2025

# Check if d1 is in a leap year.
>>> d1.is_leap_year()
result: False

# Get the number of days in d1's month.
>>> d1.days_in_month()
result: 30

# Create another object named d2
>>> d2 = Date(12, 31, 2024)

>>> d2
result: 12/31/2024

# Check if d2 is in a leap year.
>>> d2.is_leap_year()
result: True

# Get the number of days in d2's month.
>>> d2.days_in_month()
result: 31

Next, try the following examples in the IPython console to illustrate why we will need to override the __eq__ method to change the meaning of the == operator:

>>> d1 = Date(1, 1, 2025)
>>> d2 = d1
>>> d3 = d1.copy()

# Determine the memory addresses to which the variables refer.
>>> id(d1)
result: 430542     # Your memory address may differ.

>>> id(d2)
result: 430542     # d2 is a reference to the same Date that d1 references.

>>> id(d3)
result: 413488     # d3 is a reference to a different Date in memory.

# The == operator tests whether memory addresses are equal.
>>> d1 == d2
result: True       # d1 and d2 have the same memory address.

>>> d1 == d3
result: False      # d1 and d3 have different memory addresses.

Write a method advance_one(self) that changes the called object so that it represents one calendar day after the date that it originally represented.

Notes:
- This method must not return anything. Instead, it should change the value of one or more variables inside the called object.
- Since we are advancing the Date object by one day, self.day will change. Depending on what the value of self.day is, self.month and self.year may also change.
- You can use the provided days_in_month() method to determine how many days are in the Date object’s month. Because you are calling one method from inside another Date method, you should use the self parameter to make the call: self.days_in_month()
Examples:
```
>>> d = Date(12, 31, 2024)

>>> d
result: 12/31/2024

>>> d.advance_one()

>>> d
result: 01/01/2025

>>> d = Date(2, 28, 2024)

>>> d
result: 02/28/2024

>>> d.advance_one()

>>> d
result: 02/29/2024

>>> d.advance_one()

>>> d
result: 03/01/2024

>>> d.advance_one()

>>> d
result: 03/02/2024
```
Write a method __eq__(self, other) that returns True if the called object (self) and the argument (other) represent the same calendar date (i.e., if the have the same values for their day, month, and year attributes). Otherwise, this method should return False.

Recall from lecture that the name __eq__ is a special method name that allows us to override the == operator–replacing the default version of the operator with our own version. In other words, when the == operator is used with Date objects, our new __eq__ method will be invoked!

This method will allow us to use the == operator to see if two Date objects actually represent the same date by testing whether their days, months, and years are the same, instead of testing whether their memory addresses are the same.

After implementing your __eq__ method, try re-executing the following sequence of statements from Task 0:
```
>>> d1 = Date(1, 1, 2025)
>>> d2 = d1
>>> d3 = d1.copy()

# Determine the memory addresses to which the variables refer.
>>> id(d1)
result: 430542     # Your memory address may differ.

>>> id(d2)
result: 430542     # d2 is a reference to the same Date that d1 references.

>>> id(d3)
result: 413488     # d3 is a reference to a different Date in memory.

# The new == operator tests whether the fields have the same values.
>>> d1 == d2
result: True       # Both refer to the same object, so their fields
                   # also have the same values
>>> d1 == d3
result: True       # These variables refer to different objects, but
                   # their fields have the same values.
```
Notice that we now get True when we evaluate d1 == d3. That’s because the new __eq__ method compares the internals of the objects to which d1 and d3 refer, rather than comparing the memory addresses of the objects.

Write a method is_before(self, other) that returns True if the called object represents a calendar date that occurs before the calendar date that is represented by other. If self and other represent the same day, or if self occurs after other, the method should return False.

Note: This method is similar to the __eq__ method that you have written in that you will need to compare the years, months, and days to determine whether the called object (self) comes before other.

Examples:

>>> ny = Date(1, 1, 2025)

>>> d1 = Date(11, 17, 2024)

>>> d2 = Date(4, 7, 2025)

>>> tg = Date(11, 28, 2024)

>>> ny.is_before(d1)
result: False

>>> d1.is_before(ny)
result: True

>>> d1.is_before(d2)
result: True

>>> d2.is_before(d1)
result: False

>>> d1.is_before(tg)     # same month and year
result: True

>>> tg.is_before(d1)
result: False

>>> tg.is_before(tg)     # a Date object with itself
result: False

>>> ny.is_before(d2)     # same year, but different months
result: True

>>> d2.is_before(ny)
result: False

Write a method is_after(self, other) that returns True if the called object (self) represents a calendar date that occurs after the calendar date that is represented by other. If self and other represent the same day, or if self occurs before other, the method should return False.

Note: There are two ways of writing this method. You can either emulate your code for is_before OR you can think about how you could call __eq__ (==) and is_before to make writing this method very simple!

Examples:

>>> ny = Date(1, 1, 2025)

>>> d1 = Date(11, 17, 2024)

>>> d2 = Date(4, 7, 2025)

>>> tg = Date(11, 28, 2024)

>>> ny.is_after(d1)
result: True

>>> d1.is_after(ny)
result: False

>>> d1.is_after(d2)
result: False

>>> d2.is_after(d1)
result: True

>>> d1.is_after(tg)     # same month and year
result: False

>>> tg.is_after(d1)
result: True

>>> tg.is_after(tg)     # a Date object with itself
result: False

>>> ny.is_after(d2)     # same year, but different months
result: False

>>> d2.is_after(ny)
result: True

Write a method days_between(self, other) that returns an integer that represents the number of days between self and other.

Notes:
- This method should not change self or other during its execution.
- The sign of the return value is important! In particular:
  - If self and other represent the same calendar date, this method must return 0.
  - If self is before other, this method must return a negative integer equal to the number of days between the two dates.
  - If self is after other, this method must return a positive integer equal to the number of days between the two dates.
Suggested Approach:
- Since this method should not change the original objects, you should first use the copy method to create deep copies of self and other.
- Then, use is_before or is_after to figure out which date comes first.
- You should use the advance_one method that you have already written to advance the copy of the earlier of the two dates, and you should continue advancing that date until you obtain the correct count of the number of days between the two dates.
  
  Here is some pseudocode for one way of computing the correct count, assuming that date1 is the copy of the earlier date and date2 is the copy of the later date:
```
initialize your counter
while date1 != date2:
    increment your counter
    advance date1 by one day using advance_one
```
  Feel free to transform this pseudocode into Python code as part of your implementation of this method.
- As shown in the pseudocode above, you may find it helpful to use the != operator to test if two Date objects are not equal to each other. When used with Date objects, this operator will call your __eq__ method to compare the internals of the objects, and it will return the opposite of what __eq__ returns.
- Once you know how many days separate the two values, you can again use is_before or is_after to figure out whether the returned result should be positive or negative.
- You should not try to subtract years, months, and days between the two dates. This technique is too prone to mistakes.
Examples:
```
>>> d1 = Date(4, 13, 2025)

>>> d2 = Date(5, 9, 2025)   # our final exam!

>>> d2.days_between(d1)
result: 26

>>> d1.days_between(d2)
result: -26

>>> d1           # Make sure the original objects did not change.
04/13/2025

>>> d2
05/09/2025

# Here are two that pass over a leap day.
>>> d3 = Date(12, 1, 2023)

>>> d4 = Date(3, 15, 2024)

>>> d4.days_between(d3)
result: 105
```
Once all of the above methods are correctly implemented, it should be possible to use the provided day_name method to determine the day of week of any valid Date object.

For example:
```
>>> d = Date(4, 13, 2025)

>>> d.day_name()
result: 'Sunday'

>>> Date(4, 15, 2025).day_name()
result: 'Tuesday'

>>> Date(1, 1, 2100).day_name()
result: 'Friday'

>>> Date(7, 4, 1776).day_name()
result: 'Thursday'
```
If one or more of these examples do not return the expected result, that means that you have a bug in one or more of the methods that you wrote. You should test and debug as needed:
- days_between
- advance_one
- is_before and is_after
- __eq__, since it will be called implicitly when your days_between method uses the == or != operator to compare two Date objects.

Problem 2: Markov text generation

25 points; pair-optional

This is the only problem of the assignment that you may complete with a partner. See the rules for working with a partner on pair-optional problems for details about how this type of collaboration must be structured.

In this problem, you will write a program that is capable of generating meaningful text all by itself! You will accomplish this by implementing what is known as a Markov text-generation algorithm.

Background
English is a language with a lot of structure. Words have a tendency (indeed, an obligation) to appear only in certain sequences. Grammatical rules specify legal combinations of different parts of speech. For example, the phrase “The cat climbs the stairs” obeys a legal word sequence. “Stairs the the climbs cat” does not. Additionally, semantics (the meaning of a word or sentence) further limits possible word combinations. “The stairs climbs the cat” is a perfectly legal sentence, but it doesn’t make much sense and you are very unlikely to encounter this word ordering in practice.

Even without knowing the formal rules of English or the meaning of English words, we can get an idea of which word combinations are legal simply by looking at well-formed English text and noting the combinations of words that tend to occur in practice. Then, based on our observations, we can generate new sentences by randomly selecting words according to commonly occurring sequences of these words. For example, consider the following text:

I love roses and carnations. I hope I get roses for my birthday.

If we start by selecting the word “I”, we notice that “I” may be followed by “love,” “hope,” and “get” with equal probability in this text. We randomly select one of these words to add to our sentence: “I get.” We can repeat this process with the word “get,” necessarily selecting the word “roses” as the next word. Continuing this process could yield the phrase:

I get roses and carnations.

Note that this is a valid English sentence, but not one that we have seen before. Other novel sentences we might have generated include “I love roses for my birthday.” and “I get roses for my birthday.”

More formally, the process used to generate these sentences is called a first-order Markov process. A first-order Markov process is a process in which the state at time t + 1 (i.e., the next word) depends only on the state at time t (i.e., the previous word). In a second-order Markov process, the next word would depend on the two previous words, and so on. Our example above was a first-order process because the choice of the next word depended only on the current word.

Your tasks
Implementing a first-order Markov text generator will involve writing two functions: one to process a file and create a dictionary of legal word transitions, and another to actually generate the new text.

We will consider words to be different even if they only differ by capitalization or punctuation. For example, 'spam', 'Spam', and 'spam!' will all be considered distinct words.

To start, open a new file in Spyder and save it to your ps8 folder using the name ps8pr2.py.

Put all of the code that you write for this problem in that file. Don’t forget to include appropriate comments at the top of the file, and a docstring for each function.

Important

If you add test code to your ps8pr2.py file, please put it in one or more separate test functions, which you can then call to do the testing. Having test functions is not required. However, you should not have any test code in the global scope (i.e., outside of a function).

Write a function create_dictionary(filename) that takes a string representing the name of a text file, and that returns a dictionary of key-value pairs in which:
- each key is a word encountered in the text file
- the corresponding value is a list of words that follow the key word in the text file.
For example, the dictionary produced for the text “I love roses and carnations. I hope I get roses for my birthday.” would include the following key-value pairs, among others:
```
'I': ['love', 'hope', 'get']
'love': ['roses']
'roses': ['and', 'for']
'my': ['birthday.']
# as well as others!
```
Guidelines:
- We discussed this function in lecture, and we strongly encourage you to begin with the starter code that you can find in the coursepack.
- We also wrote a similar function in Task 3 of this week’s lab, and the solutions to that task are posted on the Labs page.
- You should not try to remove the punctuation from the words in the text file.
- The keys of the dictionary must include every word in the file except the sentence-ending words. A sentence-ending word is defined to be any word whose last character is a period ('.'), a question mark ('?'), or an exclamation point ('!'). A sentence-ending word should be included in the lists associated with the words that it follows (i.e., in the value parts of the appropriate key-value pairs), but it must not appear as its own key.
- If a word w1 is followed by another word w2 multiple times in the text file, then w2 must appear multiple times in the list of words associated with w1. This will allow you to capture the frequency with which word combinations appear.
- In addition to the words in the file, the dictionary must include the string $ as a special key referred to as the sentence-start symbol. This symbol will be used when choosing the first word in a sentence. In the dictionary, the list of words associated with the key '$' must include:
  - the first word in the file
  - every word in the file that follows a sentence-ending word.
  Doing this will ensure that the list of words associated with '$' includes all of the words that start a sentence. For example, the dictionary for the text “I scream. You scream. We all scream for ice cream.” would include the following entry for the sentence-start symbol:
```
'$': ['I', 'You', 'We']
```
Testing your code
To test your code, download the following file:
sample.txt

Make sure to put the file in your ps8 folder. If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.

This sample text file contains the following contents:
```
A B A. A B C. B A C. C C C.
```
Once this file is in place, run your ps8pr2.py in Spyder and test your function from the console:
```
>>> word_dict = create_dictionary('sample.txt')

>>> word_dict
result: {'A': ['B', 'B', 'C.'], 'C': ['C', 'C.'],
'B': ['A.', 'C.', 'A'], '$': ['A', 'A', 'B', 'C']}
```
The order of the keys–or of the elements within a given key’s list of values–may not be the same as what you see above, but the elements of the lists must appear in the quantities shown above for each of the four keys 'A', 'B', 'C', and '$'.

Here are some additional files you can use for testing:
- edited_mission.txt - an edited version of BU’s mission statement, and the dictionary that we derived from it.
- brave.txt - lyrics from the song Brave by Sara Bareilles, and its dictionary.
Here again, the ordering that you obtain for the keys and list elements in the dictionaries may be different. In addition, we have edited the formatting of the dictionaries to make them easier to read.
Write a function generate_text(word_dict, num_words) that takes as parameters a dictionary of word transitions (generated by the create_dictionary function) named word_dict and a positive integer named num_words. The function must use word_dict to generate and print num_words words, with a space after each word. The function must print the words that it generates. It must not return a value.

Guidelines:
- We discussed this function in lecture, and we strongly encourage you to begin with the pseudocode provided in the coursepack and turn it into appropriate Python code.
- The first word must be chosen randomly from the words associated with the sentence-start symbol, '$'. The second word must be chosen randomly from the list of words associated with the first word, etc. When the current word ends in a period ('.'), question mark ('?'), or exclamation point ('!'), the function must detect this and start a new sentence by again choosing a random word from among those associated with '$'.
- Do not include '$' in the output text. It should only be used as an internal marker for your function.
- You must use the random.choice function to choose from the list of possible words. Don’t forget to include import random at the top of your file. Then, if wordlist is the list of possible words at a given point in the generated text, you can do something like the following:
```
next_word = random.choice(wordlist)
```
- Here again, you shouldn’t try to remove or change the punctuation associated with the words, and you don’t need to worry if the generated text doesn’t end with appropriate punctuation. The generated text won’t be perfect, but most of the time it will at least be meaningful!
- If your function encounters a word that doesn’t have any words associated with it in the dictionary, the function must start a new sentence. This situation can occur if the last word in the file used to create the dictionary was unique and did not end with punctuation.
Examples
Here are two examples using the same text file as above. Your output may differ because of the randomness involved in the generation.
```
>>> word_dict = create_dictionary('sample.txt')

>>> generate_text(word_dict, 20)
B C. C C C. C C C C C C C C C C C. C C C. A

>>> generate_text(word_dict, 20)
A B A. C C C. B A B C. A C. B A. C C C C C C.
```
Try some other examples using longer documents containing English words, such as the works of William Shakespeare. In particular, we are providing a text file containing the first act of Romeo and Juliet, along with the files that we provided above in the examples for create_dictionary.

Submitting your work

Submit your ps8pr1.py and ps8pr2.py files under the assignment labeled PS 8 using these steps:

Click on the name of the assignment in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add all four files to the box labeled DRAG & DROP. You can either drag and drop the files from their folder into the box, or you can click on the box itself and browse for the files.
Click the Upload button.
You should see a box saying that your submission was successful. Click the (x) button to close that box.
The Autograder will perform some tests on your files. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.

Note: You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
If needed, use the Resubmit button at the bottom of the page to resubmit your work. Important: Every time that you make a submission, you should submit all of the files for that Gradescope assignment, even if some of them have not changed since your last submission.
Near the top of the page, click on the box labeled Code. Then click on the name of each file to view its contents. Check to make sure that the files contain the code that you want us to grade.

Important

It is your responsibility to ensure that the correct version of every file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submissions carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough time to do so, wait an hour or two and then try again. If you are unable to submit and it is close to the deadline, email your homework before the deadline to cs111-staff@cs.bu.edu

Last updated on April 28, 2025.

Problem Set 8

Preliminaries

Problems

Creating the necessary folder

Problem 1: A Date class

Problem 2: Markov text generation

Submitting your work

Problem 1: A `Date` class