ANLY 550 - Structures and Algorithms for Analytics

Spring 2019

Georgetown University

Prof. Justin Thaler

Monday and Wednesday, 3:30-4:45 in ICC 120

Email: justin.thaler at georgetown dot edu

Course Overview

Description

This course covers algorithmic techniques for solving different types of data science problems. It will cover Big O notation, data structures (arrays, stacks, queues, lists, trees, heaps, graphs), sorting and searching (binary search trees, hash tables), and algorithmic paradigms for efficient problem solving (divide and conquer, recursion, greedy algorithms, dynamic programming, etc.). It will include both theory and practice. You will learn to design, analyze, and implement fundamental data structures and algorithms. This course will provide the algorithmic background essential for further study of computer science topics.

In more detail, from this course you will learn the basic language and tools for algorithm analysis, as well as several specific problems and general paradigms for algorithm design. We will focus on the theoretical and mathematical aspects in class and on the homework assignments. But because one gains a deeper understanding of algorithms from actually implementing them, the course will include a substantial programming component. Large programming assignments can be done (but do not have to be done!) in pairs. More details will be available when the first programming assignment is given. We will be covering a great deal of material in this class. I expect the course to be challenging, both in terms of the workload and the difficulty of the material. You should be prepared to do a lot of work outside of class. The payoff will be that you will learn a lot of both useful and interesting things.

Prerequisites

The formal prerequisites for this course are ANLY-501 and ANLY-511.

The expected skills are as follows. Students should be able to program in a standard programming language (C, C++, Java, Python, etc.). Some mathematical maturity also will be expected; students should have some idea of what constitutes a mathematical proof and how to write one. Some knowledge of basic probability will also be helpful.

Reference Material

There is no required course textbook. We will primarily be following lecture notes from Michael Mitzenmacher's Data Structures and Algorithms class, CS 124, at Harvard. We will, however, omit some of the more specialized topics from that class and instead spend more time on basic data structures that CS 124 covers very quickly. There will also be some overlap with Eric Demaine's 2011 Algorithm's course at MIT. The schedule and lecture notes that we will be using are posted below (they are, however, likely to evolve over the course of the semester).

Nonetheless, students are highly encouraged to purchase and study a more formal reference on algorithm design. I personally recommend Algorithm Design by Kleinberg and Tardos. It is available in paperback form online for less than $30. Another popular reference is Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.

Grading

Grades in the course will be based on roughly biweekly problem sets, a midterm, and a final. Two of the problem sets will be involved programming assignments. The programming problem sets will count for 20% of the grade, the non-programming problem sets 30%, the midterm 20%, and the final 30%.

Collaboration Policy and Academic Honesty

I would like to emphasize the rules on working with others on homework assignments. For the programming assignments, you may work in pairs. Each group may to discuss the problem set with at most one other group, but looking at another group's code is strictly forbidden.

For any assignment with a programming component, you may not use code found online without my express permission. If you want to use code you find online, you must ask me first.

For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. You are allowed to work with at most 3 other students currently taking the course in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer.

If you collaborate with any other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.

Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action. Severe punishments will apply, so please do not violate these rules.

Lateness Policy for Problem Sets

All assignments will be due at the beginning of class on the appropriate day. Assignments will not be accepted late with the exception of medical emergencies or similar exceptional circumstances that must be discussed in advance with the instructor. Please remember it is better to turn in an incomplete assignment rather than no assignment! All assignments should be turned in electronically, as pdf or text files. You should have a scanner available, or be familiar with LaTeX, or otherwise be ready to deal with turning in pdfs of mathemetical work. (LaTeX is highly, highly recommended)

Course Logistics

Office Hours

After class or by appointment. My office is Saint Mary's Hall, Room 354

Title IX: Sexual Misconduct and Sexual Harassment

Georgetown University and its faculty are committed to supporting survivors of sexual misconduct, including relationship violence and sexual assault. Please note that university policy requires faculty members to report any disclosures about sexual misconduct to the Title IX Coordinator, whose role is to coordinate the University’s response to sexual misconduct. For details of University resources, consult Georgetown University's Sexual Misconduct Reference Guide.

Schedule

Below is the tentative schedule for the course. I will modify it as the semester progresses.

Class NumberDateDescription Readings
Algorithmic Thinking, Sorting, Basic Data Structures
11/9Overview of Algorithm Design, Grade-School Algorithm For Multiplication, Computing Fibonacci Numbers, Models of Computation Mitzenmacher's lecture notes. Also read Suresh Venkatasubramanian's musings on the relationship between algorithm design and machine learning.
21/14Finish Fibonacci Numbers. Induction, Big-Oh Notation.Mitzenmacher's lecture notes.
31/16 More Big-Oh notation. Recurrence relations. Binary Search. Mitzenmacher's lecture notes.
41/23 Divide and Conquer Algorithms: Merge Sort. Also, insertion sort. For Insertion Sort and Merge Sort, see Eric Demaine's slides.
51/28 More divide and conquer: Fast Integer Multiplication. Strassen's Algorithm for Matrix Multiplication. For Integer and Matrix Multiplication, see Mitzenmacher's lecture notes.
61/30 Data Structures. Linked Lists, Stacks, Queues, Priority Queues, Heaps. Eric Demaine's slides
72/4 More data strutures, heap sort. Eric Demaine's slides
Graphs
82/6Intro to Graphs, Adjacency Lists and Adjacency Arrays, Breadth-First Search. Demaine's notes and Mitzenmacher's notes
92/11Depth-First Search, Topological Sorting Mitzenmacher's lecture notes.
102/13Djikstra's Algorithm for Single-Source Shortest Paths. Mitzenmacher's lecture notes.
112/19 Note this is a Tuesday (official university makeup class). Bellman-Ford Algorithm for Single-Source Shortest Paths. Demaine's lecture notes.
122/20 Minimum Spanning Trees. Prim's and Kruskal's Algorithms. Mitzenmacher's lecture notes on Prim's and Kruskal's algorithms. For the union-find data structure, see these notes.
132/25Wrap up of MST algorithms.
Algorithmic Paradigms: Greedy, Divide-And-Conquer, Dynamic Programming
142/27 (More) Greedy Algorithms: Horn Formulae. Mitzenmacher's lecture notes.
153/11 Start of Dynamic Programming (String Reconsuction). Mitzenmacher's lecture notes.
163/13 Problem Set Review
173/18 Midterm Review
183/20 Midterm Exam Tentative. Official midterm date will be determined later
193/25More Dynamic Programming: Edit Distance. Mitzenmacher's lecture notes.
203/27More Dynamic Programming Examples: Floyd-Warshall (All-Pairs Shortest Paths). Mitzenmacher's lecture notes. For the example about Woody the Woodcutter covered at the end of lecture, see Problem 1 of these (very sketchy!) notes.
Reductions and NP-Completeness
214/1 Reductions. Mitzenmacher's lecture notes
224/3 Review of Midterm
234/8 More Reductions Mitzenmacher's lecture notes. For the reduction from maximum bipartite matching to max-flow, see these slides.
244/10 NP, NP-completeness, some NP-complete problems Mitzenmacher's lecture notes
254/15 Wrap up NP-Completeness
264/17 Problem Set Review
Dictionary Data Structures
274/24Binary Search Trees. Hash Tables: Chaining, Linear Probing, Cuckoo Hashing, The Power of Two Choices. Justin's handwritten notes.
284/29 Review for final.