ANLY 550 - Structures and Algorithms for Analytics

Spring 2017

Georgetown University

Prof. Justin Thaler

Tuesday and Thursday, 9:30 to 10:45

Maguire 103

Email: justin.thaler at georgetown dot edu

Course Overview

Description

This course covers algorithmic techniques for solving different types of data science problems. It will cover Big O notation, data structures (arrays, stacks, queues, lists, trees, heaps, graphs), sorting and searching (binary search trees, hash tables), and algorithmic paradigms for efficient problem solving (divide and conquer, recursion, greedy algorithms, dynamic programming, etc.). It will include both theory and practice. You will learn to design, analyze, and implement fundamental data structures and algorithms. This course will provide the algorithmic background essential for further study of computer science topics.

In more detail, from this course you will learn the basic language and tools for algorithm analysis, as well as several specific problems and general paradigms for algorithm design. We will focus on the theoretical and mathematical aspects in class and on the homework assignments. But because one gains a deeper understanding of algorithms from actually implementing them, the course will include a substantial programming component. Large programming assignments can be done (but do not have to be done!) in pairs. More details will be available when the first programming assignment is given. We will be covering a great deal of material in this class. I expect the course to be challenging, both in terms of the workload and the difficulty of the material. You should be prepared to do a lot of work outside of class. The payoff will be that you will learn a lot of both useful and interesting things.

Prerequisites

The formal prerequisites for this course are ANLY-501 and ANLY-511.

The expected skills are as follows. Students should be able to program in a standard programming language (C, C++, Java, Python, etc.). Some mathematical maturity also will be expected; students should have some idea of what constitutes a mathematical proof and how to write one. Some knowledge of basic probability will also be helpful.

Reference Material

There is no required course textbook. We will primarily be following lecture notes from Michael Mitzenmacher's Data Structures and Algorithms class, CS 124, at Harvard. We will, however, omit some of the more specialized topics from that class and instead spend more time on basic data structures that CS 124 covers very quickly. There will also be some overlap with Eric Demaine's 2011 Algorithm's course at MIT. The schedule and lecture notes that we will be using are posted below (they are, however, likely to evolve over the course of the semester).

Nonetheless, students are highly encouraged to purchase and study a more formal reference on algorithm design. I personally recommend Algorithm Design by Kleinberg and Tardos. It is available in paperback form online for less than $30. Another popular reference is Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.

Grading

Grades in the course will be based on roughly biweekly problem sets, a midterm, and a final. Two of the problem sets will be involved programming assignments. The programming problem sets will count for 20% of the grade, the non-programming problem sets 30%, the midterm 20%, and the final 30%.

Collaboration Policy and Academic Honesty

I would like to emphasize the rules on working with others on homework assignments. For the programming assignments, you may work in pairs. Each group may to discuss the problem set with at most one other group, but looking at another group's code is strictly forbidden.

For any assignment with a programming component, you may not use code found online without my express permission. If you want to use code you find online, you must ask me first.

For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. You are allowed to work with at most 3 other students currently taking the course in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer.

If you collaborate with any other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.

Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action. Severe punishments will apply, so please do not violate these rules.

Lateness Policy for Problem Sets

All assignments will be due at the beginning of class on the appropriate day. Assignments will not be accepted late with the exception of medical emergencies or similar exceptional circumstances that must be discussed in advance with the instructor. Please remember it is better to turn in an incomplete assignment rather than no assignment! All assignments should be turned in electronically, as pdf or text files. You should have a scanner available, or be familiar with LaTeX, or otherwise be ready to deal with turning in pdfs of mathemetical work. (LaTeX is highly, highly recommended)

Course Logistics

Office Hours

After class or by appointment. The location of my office hours may vary. I will frequently use SMH 342B, but I may occasionally use Car Barn 312-B.

Schedule

Below is the tentative schedule for the course. I will modify it as the semester progresses.

Class NumberDateDescription Readings
Algorithmic Thinking, Sorting, Basic Data Structures
11/12Overview of Algorithm Design, Grade-School Algorithm For Multiplication, Computing Fibonacci Numbers, Models of Computation Mitzenmacher's lecture notes. Also read Suresh Venkatasubramanian's musings on the relationship between algorithm design and machine learning.
21/17Induction, Big O Notation, Start of Sorting: Binary Search, Recurrence Relations.Mitzenmacher's lecture notes.
31/19 Insertion Sort. Divide and Conquer Algorithms: Merge Sort and Integer Multiplication. Problem Set 1 out For Insertion Sort and Merge Sort, see For Insertion Sort, see Eric Demaine's slides. For Integer Multiplication, see Mitzenmacher's lecture notes.
41/24 Start of Data Structures. Linked Lists, Stacks, Queues, Priority Queues, Heaps, and Heap Sort. Eric Demaine's slides
Graphs
51/26Intro to Graphs, Adjacency Lists and Adjacency Arrays, Breadth-First Search. Demaine's notes and Mitzenmacher's notes
61/31Depth-First Search, Topological Sorting, Strongly Connected Components Mitzenmacher's lecture notes.
72/2Djikstra's Algorithm for Single-Source Shortest Paths. Problem Set 1 due, Problem Set 2 out Mitzenmacher's lecture notes.
82/7Bellman-Ford Algorithm for Single-Source Shortest Paths. Demaine's lecture notes.
92/9Minimum Spanning Trees. Prim's and Kruskal's Algorithms. Mitzenmacher's lecture notes on Prim's and Kruskal's algorithms. For the union-find data structure, see these notes.
Algorithmic Paradigms: Greedy, Divide-And-Conquer, Dynamic Programming
102/14(More) Greedy Algorithms: Wrap up of MST algorithms, Horn Formulae. Mitzenmacher's lecture notes.
112/16(More) Greedy Algorithms: Approximating Set Cover, Huffman Coding. Problem Set 2 due. Programming Assignment 1 out Mitzenmacher's lecture notes.
122/21(More) Divide-And-Conquer Algorithms: Strassen's Algorithm for Matrix Multiplication. Start of Dynamic Programming (String Reconsuction). Mitzenmacher's lecture notes.
132/23Dynamic Programming: Edit Distance. Mitzenmacher's lecture notes.
142/28More Dynamic Programming Examples: Floyd-Warshall (All-Pairs Shortest Paths). Travelling Salesman Problem. Pseudo-polynomial time algorithm for Subset Sum. Mitzenmacher's lecture notes.
Dictionary Data Structures
153/2Hash Tables: Chaining, Linear Probing, Cuckoo Hashing, The Power of Two Choices. Programming Assignment 1 due. Problem Set 3 out. Justin's handwritten notes.
(More) Randomized Algorithms
163/14 Bloom filters, Pattern Matching via Fingerprinting Mitzenmacher's lecture notes
173/16 Midterm Review
183/21 Midterm Exam
193/23 No class (instructor travel) Programming Assignment 2 out. Problem Set 3 due 3/25.
203/28 Pairwise Independent and Universal Hash Families, Fingerprinting. Freivald's Algorithm for Verifying Matrix Multiplication. Sanjeev Arora's lecture notes
213/30 Pattern Matching (Karp-Rabin). Mitzenmacher's lecture notes
Advanced Topics
224/4 Midterm Review
234/6 NP-Completeness Part 1. Programming Assignment 2 due. Problem Set 4 out. Mitzenmacher's lecture notes
244/11 NP-Completeness Part 2. Mitzenmacher's lecture notes
254/18 Linear Programming and Network Flows Part 1 Mitzenmacher's lecture notes
264/20 Linear Programming and Network Flows Part 2 Mitzenmacher's lecture notes
274/25 TBD. Problem Set 4 due. TBD.
284/27 TBD. TBD