ANLY 558 -- Advanced Algorithms for Analytics

Spring 2021

Georgetown University

Prof. Justin Thaler

Tuesdays, 9:00 am - 11:30 am

Email: justin.thaler at georgetown dot edu

Course Overview

Description

This course covers algorithmic techniques for solving different types of data science problems. Topics include algorithm design paradigms (divide-and-conquer, greedy, dynamic programming), graph algorithms (shortest paths, minimum spanning trees), computational intractability and NP-completeness, fundamental techniques for processing massive data sets (streaming and approximation algorithms), and basics of supervised learning. You will learn to design, analyze (via theorems and proofs), and implement fundamental data structures and algorithms. This course will also provide the algorithmic background essential for further study of computer science topics.

In more detail, from this course you will learn the basic language and tools for algorithm analysis, as well as several specific problems and general paradigms for algorithm design. We will focus on the theoretical and mathematical aspects in class and on the homework assignments. But because one gains a deeper understanding of algorithms from actually implementing them, the course will include a substantial programming component. Large programming assignments can be done (but do not have to be done!) in pairs. More details will be available when the first programming assignment is given.

We will be covering a great deal of material in this class. I expect the course to be challenging, both in terms of the workload and the difficulty of the material. You should be prepared to do a lot of work outside of class. The payoff will be that you will learn a lot of both useful and interesting things.

Prerequisites

The formal prerequisites for this course are ANLY-555 and Python.

Some mathematical maturity also will be expected; students should have some idea of what constitutes a mathematical proof and how to write one. Some knowledge of basic probability will also be helpful.

Reference Material

There is no required course textbook. We will primarily be following lecture notes from various sources, all linked to on this web page. The schedule and lecture notes that we will be using are posted below (they are, however, likely to evolve over the course of the semester).

Grading

Grades in the course will be based on problem sets/quizzes (80%) and participation (20%).

Collaboration Policy and Academic Honesty

I would like to emphasize the rules on working with others on homework assignments. For specified (programming-intensive) assignments, you may work in pairs. Each group may discuss the problem set with at most one other group, but looking at another group's code is strictly forbidden.

For any assignment with a programming component, you may not use code found online without my express permission. If you want to use code you find online, you must ask me first.

For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. That is, you are allowed to work with other students currently taking the course in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer.

If you collaborate with any other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.

Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action. Severe punishments will apply, so please do not violate these rules.

Lateness Policy for Problem Sets

All assignments will be due at the beginning of class on the appropriate day. Assignments will not be accepted late with the exception of medical emergencies or similar exceptional circumstances that must be discussed in advance with the instructor. Please remember it is better to turn in an incomplete assignment rather than no assignment! All assignments should be turned in electronically, as pdf or text files. You should have a scanner available, or be familiar with LaTeX, or otherwise be ready to deal with turning in pdfs of mathemetical work.

Course Logistics

Office Hours

After class or by appointment.

Title IX: Sexual Misconduct and Sexual Harassment

Georgetown University and its faculty are committed to supporting survivors of sexual misconduct, including relationship violence and sexual assault. Please note that university policy requires faculty members to report any disclosures about sexual misconduct to the Title IX Coordinator, whose role is to coordinate the University’s response to sexual misconduct. For details of University resources, consult Georgetown University's Sexual Misconduct Reference Guide.

Schedule

Below is the tentative schedule for the course. I will modify it as the semester progresses.

Class NumberDateDescription Readings
Algorithm Design Paradigms
11/26Overview of Algorithm Design, Grade-School Algorithm For Multiplication, Computing Fibonacci Numbers, Models of Computation, Induction, Big-Oh Notation Mitzenmacher's lecture notes 1 and lecture notes 2. Also read Suresh Venkatasubramanian's musings on the relationship between algorithm design and machine learning.
22/2 Recurrence relations. Binary Search. Divide and Conquer Algorithms: Merge Sort, Fast Integer Multiplication. Strassen's Algorithm for Matrix Multiplication. Mitzenmacher's lecture notes.
32/9 Dynamic Programming (String Reconstruction, Edit Distance). Mitzenmacher's lecture notes.
Graph Algorithms
42/16Intro to Graphs, Adjacency Lists and Adjacency Arrays, Breadth-First Search. DFS and topological sorting. Demaine's notes and Mitzenmacher's notes
52/23Dijkstra's Algorithm for Single-Source Shortest Paths. Mitzenmacher's lecture notes.
63/2 Minimum Spanning Trees---Prim's and Kruskal's Algorithms. Floyd-Warshall (All-Pairs Shortest Paths). Mitzenmacher's lecture notes on Prim's and Kruskal's algorithms. For the union-find data structure, see these notes.
Reductions and NP-Completeness
73/9 Reductions. Mitzenmacher's lecture notes For the reduction from maximum bipartite matching to max-flow, see these slides.
83/16 NP, NP-completeness, some NP-complete problems Mitzenmacher's lecture notes
Advanced Topics
93/23 Approximation algorithms for intractable problems. Fingerprinting and the power of randomness. Mitzenmacher's lecture notes . For fingerprinting see these notes.
104/6 Intro to streaming algorithms. Finding frequent items and point queries in insert-only streams. Tools from probability theory. Lectures 1 and 2 of Amit's notes
114/13 Sampling Algorithms for Insert-Only Streams: Approximate Medians, Reservoir sampling, Itemset Frequency Estimation. Estimating Distinct Elements in Insert-Only Streams. Andrew's slides
124/20 Answering Point Queries for turnstile streams: Count-Min and Count Sketch. Linear Sketches, Johnson-Lindenstrauss, Random Projections for Dimensionality Reduction. Lecture 4 of Amit's notes
134/27 Probably Approximately Correct (PAC) learning. Occam's razor. TBD
145/4 TBDTBD