COSC 2010: Data Structures

Project 1
Fall 2023

Due: W 9/15 @ 5 PM ET
9 points

This project involves implementing a Map ADT using an extendable hash table with separate chaining as the collision-handling scheme. For comparison, study Java's Hashtable<K,V> and C++'s unordered_map. For this project, use hashtable.txt to get started, which you can retrieve from cs-class (class-1.cs.georgetown.edu) using the command

cs-class-1% cp ~maloofm/cosc2010/p1.zip ./

Hashtable<K,V,H> stores keys of type K with values of type V. It uses the hash-code function long long H<K>::operator()( const K& ) const. (See hasher.txt.) It has a default capacity of 11 and maximum load factor of 0.75 Programmers can specify different capacities, load factors, and hash-code functions using constructors and methods. By default, Hashtable<K,V,H> uses the multiply-and-divide (MAD) method as its compression function.

Hashtable<K,V,H> dynamically changes its capacity based on its current load factor. Hashtable<K,V,H> uses a table of prime numbers that roughly double in size to determine its new capacity. When the hash table exceeds its maximum load factor, it increases its capacity based on the next prime number in the table and rehashes its entries. Similarly, when the hash table falls below a specified minimum load factor (default: 0.35), it decreases its capacity using the previous prime number in the table and rehashes its entries. To make rehashing more efficient, the hash table stores hash codes along with the key-value entries.

For the bucket array, Hashtable<K,V,H> uses a dynamically-allocated array of pointers to doubly-linked lists. Entry<K,V> implements a node for a doubly-linked list that stores a key, a value, and the key's hash code, which we can use when rehashing. (See entry.txt.)

To declare an array of pointers to Entry<K,V>, we write:

Entry<K,V> **table;
which defines a so-called double pointer, which is a pointer that points to pointers to Entry<K,V>. To allocate table, we use the statements
table = new Entry*[primes[primeIndex]]];
We must then ensure that the pointers of table point to null. Then at each position of table, we can build a doubly-linked list of entries. Clearing the hash table entails destructing the collision lists. Destructing the hash table entails clearing the hash table and deallocating the bucket array table.

One design decision you should make is the policy for increasing and decreasing the capacity of the hash table. For example, after clearing a hash table, it will have a load factor of zero. Should the table's capacity decrease to the default capacity or should it stay at its current capacity? In the latter case, since the load factor is now below the minimum threshold, there is a risk that subsequent operations may repeatedly reduce the capacity of the hash table, which would be inefficient and perhaps unnecessary. What control should you give programmers in controlling how the hash table changes its capacity?

Test your implementation of the Hashtable<K,V,H> class in main.cc. We will use our own main function to evaluate your implementation. You must not modify the class names or the public interfaces. Our main.cc will include only main.h, so put all necessary includes there. You will need to declare additional methods and data members. Use the Makefile included in the zip file for this project while making the appropriate modifications.

Instructions for Electronic Submission: In a file named HONOR, please include the statement:

In accordance with the class policies and Georgetown's Honor System,
I certify that, with the exceptions of the class resources and those
items noted below, I have neither given nor received any assistance
on this project.

Name
NetID
Include this file in your zip file submit.zip.

Submit p1 exactly like you submitted p0. Make sure you remove all debugging output before submitting.

Plan B

If Autolab is down, upload your zip file to Canvas.

Copyright © 2023 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.