COSC 160 Project 1

COSC 160: Data Structures

Project 1
Fall 2021

Due: F 9/17 @ 5 PM ET
9 points

This project involves implementing a Map ADT using an extendable hash table with separate chaining as the collision-handling scheme. For comparison, study Java's Hashtable<K,V> and C++'s unordered_map. For this project, use hashtable.txt to get started, which you can retrieve from cs-class-1 (class-1.cs.georgetown.edu) using the command

cs-class-1% cp ~maloofm/cosc160/p1.zip ./

Hashtable<K,V,H> stores keys of type K with values of type V. It uses the hash-code function long long H<K>::operator()( const K& ) const. (See hasher.txt.) It has a default capacity of 11 and load factor of 0.75 Programmers can specify different capacities, load factors, and hash-code functions using constructors and methods. By default, Hashtable<K,V,H> uses the multiply-and-divide (MAD) method as its compression function.

Hashtable<K,V,H> dynamically changes its capacity based on its current load factor. Hashtable<K,V,H> uses a table of prime numbers that roughly double in size to determine its new capacity. When the hash table reaches its maximum load factor, it increases its capacity based on the next prime number in the table and rehashes its entries. Similarly, when the hash table falls below a specified minimum load factor (default: 0.35), it decreases its capacity using the previous prime number in the table and rehashes its entries. To make rehashing more efficient, the hash table stores hash codes along with the key-value entries.

For the bucket array, Hashtable<K,V,H> uses a dynamically-allocated array of pointers to doubly-linked lists. Entry<K,V> implements a node for a doubly-linked list that stores a key, a value, and the key's hash code, which we can use when rehashing. (See entry.txt.)

To declare an array of pointers to Entry<K,V>, we write:

Entry<K,V> **table;

which defines a so-called double pointer, which is a pointer that points to pointers to Entry<K,V>. To allocate table, we use the statement

table = new Entry*[primes[primeIndex]]];

We must then ensure that the pointers of table point to null. Then at each position of table, we can build a doubly-linked list of nodes. Clearing the hash table entails destructing the collision lists. Destructing the hash table entails clearing the hash table and deallocating the bucket array table.

One design decision you should make is the policy for increasing and decreasing the capacity of the hash table. For example, after clearing a hash table, it will have a load factor of zero. Should the table's capacity decrease to the default capacity or should it stay at its current capacity? In the latter case, since the load factor is now below the minimum threshold, there is a risk that subsequent operations may repeatedly reduce the capacity of the hash table, which would be inefficient and perhaps unnecessary. What control should you give programmers in controlling how the hash table changes its capacity?

As with previous projects, test your implementation of the Hashtable<K,V,H> class in main.cc. We will use our own main function to evaluate your class. You must not modify the class name or the public interface. Our main.cc will include only main.h, so put all necessary includes there. You will need to declare any additional methods or data members. Use the Makefile from hw1 with the appropriate modifications.

Application

Optionally, if you are interested in exploring an application, we use hash tables to check the spelling of words in documents. If we have a dictionary of words, we put the words of the dictionary into a hash table. We then check if each word of a document is in the hash table. If it is not, either the word is not in the dictionary or it is misspelled.

On cs-class-1, there are 479,829 unique words in the file /usr/share/dict/linux.words. (Please do not make local copies of linux.words on cs-class-1. Your program can read the file directly. You can, however, copy linux.words to your own computer for development and testing.)

For a document, I put Hamilton's Federalist Papers No. 1 in the zip file that I distributed with the assignment. (See f1.txt.) You need to remove the punctuation if a word has it. You can write your own routine to strip punctuation marks from the end of words. You can also do this with the erase-remove idiom using ispunct(unsigned char), remove_if defined in algorithm, and string::erase.

You may be unfamiliar with the declaration [](unsigned char x){return std::isspace(x);}. It is a lambda expression, which defines a nameless function. You can simply substitute isspace for ispunct. You can also define a named function to pass in as the third parameter to remove_if.

Instructions for Electronic Submission: In a file named HONOR, please include the statement:

In accordance with the class policies and Georgetown's Honor Code,
I certify that, with the exceptions of the class resources and those
items noted below, I have neither given nor received any assistance
on this project.

<name>
<netID>

Include this file in your zip file submit.zip.

Submit p1 exactly like you submitted hw1. Make sure you remove all debugging output before submitting.

Plan B

If Autolab is down, upload your zip file to Canvas.