If you don’t see your question here, post it on Piazza or come to office hours! See the links in the navigation bar for both of those options.
When splitting the text in clean_text, what separator character
should we pass in to the split method?
None. Because we want split to split on all whitespace characters
(spaces, tabs, and newlines), you should not pass in a
separator – not even a space. Rather, you should use empty
parentheses when calling split, which will cause it to split
on all whitespace characters.
My clean_text function splits the text into a list of words, and
then it uses a loop to clean each word in the list. However, when
I look at the return value, the individual words in the list have
not been changed. What am I doing wrong?
Don’t forget that you can only change the internals of a list if you assign something to one of the positions in the list. For example, consider the following code fragment:
my_words = ['hello', 'world'] for w in my_words: w = w.upper() # changes w, but *not* my_words! print(my_words)
If you run this code fragment, you’ll see that the contents of
my_words are unchanged. That’s because the assignment inside the
loop changes w, but it doesn’t change the contents of
my_words. In order to change the contents of my_words, we
would need to use an index-based loop.
An easier approach would be to clean the text BEFORE splitting it!
Is my clean_text function good enough?
The clean_text function should at least remove the specified
punctuation symbols and make every letter lowercase. Also remember
that clean_text must return a list of words. This means you
should split up the cleaned string inside of clean_text. If you
are unsure of how to do this, check out the word_frequencies
function in the example code online.
These are the minimum requirements. If you have time, you are welcome to take additional steps to further clean the text.
How do I update the words and word_lengths dictionaries?
You should start by reading the pseudocode we’ve given you for
add_string.
Note that the for loop in the pseudocode goes through each word
in a list called word_list that contains all of the words in the
original string. You should complete the body of that loop so
that, for each value of the variable w, it updates the frequency
of w in the self.words dictionary. What are the keys for that
dictionary? How can you correctly update that dictionary in light
of the current word w? You may want to review the
example code from PS 8 for a reminder of how to update
a dictionary.
When you update word_lengths, what are the keys in the
word_lengths dictionary? How can you transform a word into a key
in this dictionary? Once you answer these questions, you can add
the code needed to update word_lengths.
How do I read from a file?
In lecture, we presented two different ways to read the contents of a file. You can consult the lecture notes from a couple of weeks ago, or check out the example code online. In the problem set, we recommend reading in the entire file into a single string and then adding that string to your model.
How can I test add_file?
In Spyder, open up a new file. It doesn’t matter what you call it
but you must save it in your project folder. Add a few
sentences to the text file and save it. Suppose you called the
file foo. Try adding the file to a TextModel object.
model = TextModel("Test") # you can call the model anything you want model.add_file("foo") # we want to add the file `foo` to the model
(Note: You should replace "foo" with the full name of the
file that you saved in Spyder. If Spyder gave the file a .py
extension, you should include that .py in the name of the
file. If Spyder gave the file a .txt extension, you should include
that .txt in the name of the file.)
Now try printing the model and the dictionaries that it contains.
Do the right words and frequencies appear in the model? If
everything looks good, then your add_file function should be
fine. If not, it may be an issue in add_file or in any methods
you use inside of the function. You can use debugging print
statements to narrow down the cause of the issue.
How do I read from a file?
In lecture, we presented two different ways to read the contents of a file. You can consult the lecture notes from a couple weeks ago, or check out the example code online. In the problem set, we recommend reading in the entire file into a single string and then adding that string to your model.
Why are my save_model and read_model functions are not working
properly?
Go through the test case we give you in the assignment one step at
a time. After you save a model, open one of the dictionary files
using a text editor (such as the editor in Spyder). Are the
correct dictionaries inside of the files? If so, the issue is
likely inside of your read_model function. Remember that after
you read the dictionaries from the appropriate file, you must
store them somewhere in the TextModel object. For example, to
store the word-frequency dictionary, you must do an assignment
that looks something like this:
self.words = ...
where you replace ... with the correct expression.
Last updated on November 21, 2025.