This course covers algorithmic techniques for solving different types of data science problems. It will cover Big O notation, data structures (arrays, stacks, queues, lists, trees, heaps, graphs), sorting and searching (binary search trees, hash tables), and algorithmic paradigms for efficient problem solving (divide and conquer, recursion, greedy algorithms, dynamic programming, etc.). It will include both theory and practice. You will learn to design, analyze, and implement fundamental data structures and algorithms. This course will provide the algorithmic background essential for further study of computer science topics.In more detail, from this course you will learn the basic language and tools for algorithm analysis, as well as several specific problems and general paradigms for algorithm design. We will focus on the theoretical and mathematical aspects in class and on the homework assignments. But because one gains a deeper understanding of algorithms from actually implementing them, the course will include a substantial programming component. Large programming assignments can be done (but do not have to be done!) in pairs. More details will be available when the first programming assignment is given. We will be covering a great deal of material in this class. I expect the course to be challenging, both in terms of the workload and the difficulty of the material. You should be prepared to do a lot of work outside of class. The payoff will be that you will learn a lot of both useful and interesting things.
The formal prerequisites for this course are ANLY-501 and ANLY-511.
The expected skills are as follows. Students should be able to program in a standard programming language (C, C++, Java, Python, etc.). Some mathematical maturity also will be expected; students should have some idea of what constitutes a mathematical proof and how to write one. Some knowledge of basic probability will also be helpful.
There is no required course textbook. We will primarily be following lecture notes from Michael Mitzenmacher's Data Structures and Algorithms class, CS 124, at Harvard. We will, however, omit some of the more specialized topics from that class and instead spend more time on basic data structures that CS 124 covers very quickly. There will also be some overlap with Eric Demaine's 2011 Algorithm's course at MIT. The schedule and lecture notes that we will be using are posted below (they are, however, likely to evolve over the course of the semester).
Nonetheless, students are highly encouraged to purchase and study a more formal reference on algorithm design. I personally recommend Algorithm Design by Kleinberg and Tardos. It is available in paperback form online for less than $30. Another popular reference is Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Grades in the course will be based on roughly biweekly problem sets, a midterm, and a final. Two of the problem sets will be involved programming assignments. The programming problem sets will count for 20% of the grade, the non-programming problem sets 30%, the midterm 20%, and the final 30%.
I would like to emphasize the rules on working with others on homework assignments. For the programming assignments, you may work in pairs. Each group may to discuss the problem set with at most one other group, but looking at another group's code is strictly forbidden.
For any assignment with a programming component, you may not use code found online without my express permission. If you want to use code you find online, you must ask me first.
For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. You are allowed to work with at most 3 other students currently taking the course in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer.
If you collaborate with any other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.
Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action. Severe punishments will apply, so please do not violate these rules.
After class or by appointment. The location of my office hours may vary. I will frequently use SMH 342B, but I may occasionally use Car Barn 312-B.
Below is the tentative schedule for the course. I will modify it as the semester progresses.
|Algorithmic Thinking, Sorting, Basic Data Structures|
|1||1/12||Overview of Algorithm Design, Grade-School Algorithm For Multiplication, Computing Fibonacci Numbers, Models of Computation||Mitzenmacher's lecture notes. Also read Suresh Venkatasubramanian's musings on the relationship between algorithm design and machine learning.|
|2||1/17||Induction, Big O Notation, Start of Sorting: Binary Search, Recurrence Relations.||Mitzenmacher's lecture notes.|
|3||1/19||Insertion Sort. Divide and Conquer Algorithms: Merge Sort and Integer Multiplication. Problem Set 1 out||For Insertion Sort and Merge Sort, see For Insertion Sort, see Eric Demaine's slides. For Integer Multiplication, see Mitzenmacher's lecture notes.|
|4||1/24||Start of Data Structures. Linked Lists, Stacks, Queues, Priority Queues, Heaps, and Heap Sort.||Eric Demaine's slides|
|5||1/26||Intro to Graphs, Adjacency Lists and Adjacency Arrays, Breadth-First Search.||Demaine's notes and Mitzenmacher's notes|
|6||1/31||Depth-First Search, Topological Sorting, Strongly Connected Components||Mitzenmacher's lecture notes.|
|7||2/2||Djikstra's Algorithm for Single-Source Shortest Paths. Problem Set 1 due, Problem Set 2 out||Mitzenmacher's lecture notes.|
|8||2/7||Bellman-Ford Algorithm for Single-Source Shortest Paths.||Demaine's lecture notes.|
|9||2/9||Minimum Spanning Trees. Prim's and Kruskal's Algorithms.||Mitzenmacher's lecture notes on Prim's and Kruskal's algorithms. For the union-find data structure, see these notes.|
|Algorithmic Paradigms: Greedy, Divide-And-Conquer, Dynamic Programming|
|10||2/14||(More) Greedy Algorithms: Wrap up of MST algorithms, Horn Formulae.||Mitzenmacher's lecture notes.|
|11||2/16||(More) Greedy Algorithms: Approximating Set Cover, Huffman Coding. Problem Set 2 due. Programming Assignment 1 out||Mitzenmacher's lecture notes.|
|12||2/21||(More) Divide-And-Conquer Algorithms: Strassen's Algorithm for Matrix Multiplication. Start of Dynamic Programming (String Reconsuction).||Mitzenmacher's lecture notes.|
|13||2/23||Dynamic Programming: Edit Distance.||Mitzenmacher's lecture notes.|
|14||2/28||More Dynamic Programming Examples: Floyd-Warshall (All-Pairs Shortest Paths). Travelling Salesman Problem. Pseudo-polynomial time algorithm for Subset Sum.||Mitzenmacher's lecture notes.|
|Dictionary Data Structures|
|15||3/2||Hash Tables: Chaining, Linear Probing, Cuckoo Hashing, The Power of Two Choices. Programming Assignment 1 due. Problem Set 3 out.||Justin's handwritten notes.|
|(More) Randomized Algorithms|
|16||3/14||Bloom filters, Pattern Matching via Fingerprinting||Mitzenmacher's lecture notes|
|19||3/23||Document similarity. Problem Set 3 due. Programming Assignment 2 out.||Mitzenmacher's lecture notes|
|20||3/28||Primality Testing||Mitzenmacher's lecture notes|
|21||3/30||NP-completeness.||Mitzenmacher's lecture notes|
|22||4/4||Linear Programming||Mitzenmacher's lecture notes|
|23||4/6||Linear Programming and Network Flows Programming Assignment 2 due. Problem Set 4 out.||Mitzenmacher's lecture notes|
|26||4/20||TBD. Problem Set 4 due.||TBD|