One standard use of a hash table is counting the frequency of words in a file. For this assignment, you will use a hash table to implement a word-frequency counting program.
🚨 Before starting the assignment, watch How to: Setup for a New Assignment.
This tutorial walks you through how to set up a new repository, make commits, and push code to Github.
- Your submitted code should be in a new (public) repo on Github.
- Your repository should have a minimum of 5 commits.
- Your repo should include a README with the name of your project and a description.
- Create a demo video. The demo should include a walkthrough of your code and demonstration of your project working.
- [Optional] Upload your video to Google Drive and share a link if Gradescope upload speeds are too slow.
- Submit the link to your repo and demo on Gradescope.
Your program will do the following:
- Count the number of occurrences of each word in the file.
- Print all of the words and their frequencies.
For example, a text file that contains these lines:
I write, erase, rewrite
Erase again, and then
A poppy blooms.
would generate this output:
a: 1
again: 1
and: 1
blooms: 1
erase: 2
i: 1
poppy: 1
rewrite: 1
then: 1
write: 1
Assumptions:
- The starter code handles all of the file I/O and string tokenization discussed below:
- Words will be counted in a case-insensitive manner (For example, in the above example,
Erase
anderase
are counted as the same word.) - Punctuation is ignored. You can use a delimiter to ignore the following characters:
, . ; : - ? !
- Assume that the input file consists of letter-only words (That is, the file will not have words that contain apostrophes such as
isn’t
and‘tis
).
Download the starter code from here, which includes:
Click here to download zip file
main.py
HashTable.py
LinkedList.py
Node.py
example.txt
Your goals are:
Complete the TODOs
in HashTable.py
:
-
create_arr
- Complete thecreate_arr
method inHashTable.py
. Each element of the hash table (arr) is a linked list. This method creates an array (list) of a given size and populates each of its elements with a LinkedList object. Note: Doing[LinkedList()] * size
does not work. -
hash_func
- Complete thehash_func
method inHashTable.py
. Create your own hash function. Hash functions are a function that turns each of these keys into an index value that we can use to decide where in our list each key:value pair should be stored. -
insert
- Complete theinsert
method inHashTable.py
. Should insert a key value pair into the hash table, where the key is the word and the value is a counter for the number of times the word appeared. When inserting a new word in the hash table, be sure to check if there is a Node with the same key in the table already. -
print_key_values
- Complete theprint_key_values
method inHashTable.py
. Traverse through the every Linked List in the table and print the key value pairs (formatted like the above example)
- Print the total number of distinct words at the beginning of your program.
- Offer the user a prompt to query the exact count of a particular word.
Coming soon.