This course covers algorithmic techniques for solving different types of data science problems. It will cover Big O notation, data structures (arrays, stacks, queues, lists, trees, heaps, graphs), sorting and searching (binary search trees, hash tables), and algorithmic paradigms for efficient problem solving (divide and conquer, recursion, greedy algorithms, dynamic programming, etc.). It will include both theory and practice. You will learn to design, analyze, and implement fundamental data structures and algorithms. This course will provide the algorithmic background essential for further study of computer science topics.
In more detail, from this course you will learn the basic language and tools for algorithm analysis, as well as several specific problems and general paradigms for algorithm design. We will focus on the theoretical and mathematical aspects in class and on the homework assignments. But because one gains a deeper understanding of algorithms from actually implementing them, the course will include a substantial programming component. Large programming assignments can be done (but do not have to be done!) in pairs. More details will be available when the first programming assignment is given. We will be covering a great deal of material in this class. I expect the course to be challenging, both in terms of the workload and the difficulty of the material. You should be prepared to do a lot of work outside of class. The payoff will be that you will learn a lot of both useful and interesting things.
The formal prerequisites for this course are ANLY-501 and ANLY-511.
The expected skills are as follows. Students should be able to program in a standard programming language (C, C++, Java, Python, etc.). Some mathematical maturity also will be expected; students should have some idea of what constitutes a mathematical proof and how to write one. Some knowledge of basic probability will also be helpful.
There is no required course textbook. We will primarily be following lecture notes from Michael Mitzenmacher's Data Structures and Algorithms class, CS 124, at Harvard. We will, however, omit some of the more specialized topics from that class and instead spend more time on basic data structures that CS 124 covers very quickly. There will also be some overlap with Erik Demaine's 2011 Algorithm's course at MIT. The schedule and lecture notes that we will be using are posted below (they are, however, likely to evolve over the course of the semester).
Nonetheless, students are highly encouraged to purchase and study a more formal reference on algorithm design. I personally recommend Algorithm Design by Kleinberg and Tardos. It is available in paperback form online for less than $30. Another popular reference is Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Grades in the course will be based on roughly biweekly problem sets, a midterm, and a final. Two of the problem sets will be involved programming assignments. The programming problem sets will count for 20% of the grade, the non-programming problem sets 30%, the midterm 20%, and the final 30%.
I would like to emphasize the rules on working with others on homework assignments. For the programming assignments, you may work in pairs. Each group may to discuss the problem set with at most one other group, but looking at another group's code is strictly forbidden.
For any assignment with a programming component, you may not use code found online without my express permission. If you want to use code you find online, you must ask me first.
For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. You are allowed to work with at most 3 other students currently taking the course in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer.
If you collaborate with any other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.
Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action. Severe punishments will apply, so please do not violate these rules.
After class or by appointment. My office is Saint Mary's Hall, Room 354
Below is the tentative schedule for the course. I will modify it as the semester progresses.