Spring 2015

- Announcements
- Where, When, Who
- Description
- Learning Goals
- Policies
- Assignments and Grading
- Materials: Readings, Videos, and Links
- Schedule
- Other Interesting Links

- 4/15/15: Posted room for final exam: REI 284.
- 4/1/15: Posted p5.
- 3/2/15: Posted p4.
- 2/13/15: Posted p3.
- 1/30/15: Posted p2.
- 1/15/15: Collapsed the Lecture and Project Forums into a single forum.
- 1/13/15: Posted BDT Slides.
- 1/7/15: Posted p1.
- 1/6/15: Created forums on Bb for discussion and questions about projects and lecture material.

Class Time: | MW 3:30 PM – 4:45 PM |

Classroom: | REI 284 |

Instructor: | Mark Maloof |

Office: | 325 St. Mary's Hall |

Mailbox: | 329A St. Mary's Hall |

Office Hours: | In-person (325 STM): TR 11:00 AM–12:00 PM; online: M 10:30–11:30 AM and W 3:00–4:00 PM; or by appointment. Send me an email to get the Zoom link for online office hours. |

This graduate lecture surveys the major research areas of machine learning. Through traditional lectures, programming projects, paper presentations, and research projects, students learn (1) to understand the foundations of machine learning, (2) to comprehend, analyze, and critique papers from the primary literature, (3) to replicate studies described in the primary literature, and (4) to design, conduct, and present their own studies. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include Bayesian decision theory, instance-based approaches, Bayesian methods, decision trees, rule induction, density estimation, linear classifiers, neural networks, support vector machines, ensemble methods, learning theory, evaluation, and applications. Time permitting additional topics include genetic algorithms, unsupervised learning, semi-supervised learning, outlier detection, sequence learning, and reinforcement learning.

Prerequisites: Ideally, students will have taken undergraduate courses in computer science through data structures, but at the very least, students must be able to implement trees and graphs using a high-level programming language, such as C++, Java, Python, or Ruby. Students should have also taken undergraduate courses in mathematics, such as calculus, linear algebra, and probability and statistics.

Primary Text:

*Machine Learning: The Art and Science of Algorithms that Make Sense of Data*, by Peter Flach [ WWW | CUP | Amazon ]

By the end of the semester, students will be able to:

- explain the main foundations of machine learning
- understand and design object-oriented systems for machine learning
- analyze algorithms for machine learning in time and space
- implement methods of machine learning using a high-level programming language
- comprehend, analyze, and critique scholarly papers from the primary literature
- replicate studies described in the primary literature
- design and conduct empirical studies

My course policies are designed to supplement the CS Department's Honor Policy. Unless stated otherwise when I distribute an assignment, the following is the default for all assignments for this course. I've developed my policies from past teaching experiences and from the CS Department's Honor Code at George Mason University.

I am obligated to refer all suspected cases of academic dishonesty by graduate students to the dean of the Graduate School. If you have any questions about these policies or how they apply, please discuss such concerns with me during class, during office hours, or by e-mail.

In my experience, students at Georgetown do honest work. The small percentage of students who have submitted someone else's work as their own did so because they did not manage their time wisely.

Students must follow proper scholarly practice for all submitted work, whether graded or ungraded and whether a draft or final version of a proposal, paper, or program. We must acknowledge our reliance on the work of others through citation.

Students may be quite adept at and knowledgeable about citing and quoting material from traditional sources, such as books and articles. Typically, we do not have cite facts, common math formulae, or expressions of our own ideas, observations, interpretations, and analyses, However, new graduate students in computer science may not realize that formulae, theorems, proofs, algorithms, and programs can require the same treatment as any other form of expression.

For convenience, you do not need to cite the course materials, conversations with me or information you obtain from class lectures and discussions. If you are unsure about what requires citation or what constitutes proper scholarly practice, please ask me during class, during office hours, or by e-mail.

I design my courses and assignments so students have what they need to complete the assignments individually without consulting outside resources. I determine the size of and credit for assignments based on the assumption that the work for them is the result of individual effort using only the course resources and materials. Students who use outside resources to complete assignments may not be eligible for full credit. Students who do not acknowledge their use of outside resources to complete assignments may be in violation of my course policies and the university's policies on academic integrity.

The following list details acceptable and unacceptable practices:

- You can:
- obtain assistance in understanding course materials (textbooks, lecture notes, assignments);
- obtain assistance in learning to use the computing facilities;
- obtain assistance in learning to use special features of a programming language's implementation;
- obtain assistance in determining the syntactic correctness of a particular programming language statement or construct;
- obtain an explanation of a particular syntactic error;
- obtain explanations of compilation or run-time error messages.

- You can obtain assistance only from me and the teaching assistants:
- in designing the data structures and algorithms used in your solution;
- in modifying the design of an algorithm or data structure determined to be faulty;
- in implementing your algorithm or data structure in a programming language;
- in correcting a faulty implementation of your algorithm or data structure;
- in determining the semantic correctness of your program;
- in designing an experimental study and interpreting its results.

- You can not:
- show or give a copy of your work in any amount or form to another student;
- see or receive a copy of someone else's work in any amount or form;
- attempt to gain access to files other than your own or those that I designate and authorize;
- inspect or retain in your possession another student's work, whether it was given to you by another student, it was found after other student discarded his or her work, or it accidentally came into your possession;
- collaborate in any way with someone else in the design, implementation, or logical revision of an algorithm;
- use or present as your own any algorithm, data structure, or implementation that is not of your own or of my design, or which is not part of the course's required reading. If you modify any procedure which is presented in the course's texts that is not specifically mentioned in class or covered in reading assignments, then a citation with page number must be given;
- incorporate code written by others (such as can be found on the Internet).

Policies dealing with late projects, cell phones, attendance, and inclement weather.

- You should submit all assignments on time. For late projects, there will be a 1% deduction for each minute after the deadline. In the real world, it won't be your grade that decreases. It'll be your stock price.
- If you use your laptop for development, you should keep a backup of your projects on a university or department machine (e.g., cs-class).
- You must take the final exam with the section and during the period designated by the Registrar.
- It is my job to maintain a constructive learning environment for everyone. Students bringing cell phones to class must either set the phone to vibrate, turn the ringer volume off, or turn the phone off completely.
- Attendance is not required, but there is a correlation between attendance and grade. It's positive.
- It is fine if you must leave class early, arrive late, or leave the room to answer your phone, but you should do so in a manner that does not disturb your fellow students.
- In the case of inclement weather that results in the university's closure, we will meet virtually during normal class times using Blackboard Collaborate.

- Programming Projects, 50%
- Project 1, assigned W 1/7, due R 1/29, 10 points
- Project 2, assigned W 1/28, due
~~R 2/12~~Su 2/15, 10 points - Project 3, assigned W 2/11, due Su 3/1, 10 points
- Project 4, assigned M 3/2, due W 4/1, 10 points
- Project 5, assigned M 3/30, due M 4/27, 10 points
- Midterm Exam, W 3/4, 20%
- Final Exam, Sa 5/2, 4–6 PM, 30%, REI 284

String Grades::getLetterGrade() { if (grade >= 94) return "A"; else if (grade >= 90) return "A-"; else if (grade >= 87) return "B+"; else if (grade >= 84) return "B"; else if (grade >= 80) return "B-"; else if (grade >= 67) return "C"; else return "F"; } // Grades::getLetterGrade

- Perez-Hernandez, D. (28 March 2014).
Taking notes by hand benefits recall,
researchers find.
*The Chronicle of Higher Education*. (Read if interested.) - Mueller, P. A. and Oppenheimer, D. M. (2014).
The pen is mightier
than the keyboard: Advantages of longhand over laptop note taking.
*Psychological Science*, 25(6):1159–1168. (Read if interested.) - Flach, P. (2012).
*Machine learning: The art and science of algorithms that make sense of data*. Cambridge University Press, Cambridge. - Mitchell, T. M. (1997).
*Machine learning*. McGraw-Hill, New York, NY. - Murphy, K. P. (2012).
*Machine learning: A probabilistic perspective*[electronic resource]. MIT Press, Cambridge, MA. - Gomes, L. (20 Oct 2014).
Machine-learning maestro Michael Jordan on the delusions of
Big Data and other huge engineering efforts.
*IEEE Spectrum*. - Domingos, P. (2012).
A few useful things to know about machine learning.
*Communications of the ACM*55(10): 78–87. - Duda, R. O., and Hart, P. E. and Stork, D. G. (2000).
*Pattern classification*. John Wiley & Sons, New York, NY. - Slides: Bayesian Decision Theory.
- Provost, F., Fawcett, T., and Kohavi, R. (1998).
The case against accuracy estimation for comparing
induction algorithms.
In
*Proceedings of the Fifteenth International Conference on Machine Learning*, 445–453. Morgan Kaufmann, San Francisco, CA. - Fawcett, R. (2006).
An introduction to ROC analysis.
*Pattern Recognition Letters*27(8): 859–928. - Hearst, M.A., et al. (1998).
Support vector machines.
*IEEE Intelligent Systems and their Applications*13(4): 19–28. - Müller, K.-R., et al. (2001).
An introduction to kernel-based learning algorithms.
*IEEE Transactions on Neural Networks*12(2): 181–201. - Yuan, G.-X., Ho, C.-H., and Lin, C.-J. (2012).
Recent advances of large-scale linear classification.
*Proceedings of the IEEE*100(9): 2584–2603.

- Introduction: Definitions, Areas, History, Paradigms
- Bayesian Decision Theory
- Instance-based learning:
*k*-NN,*kd*-trees - Probabilistic learning: MLE, Bayes' Theorem, MAP, naive Bayes, Bayesian naive Bayes
- Density estimation: Parametric, Non-parametric, Bayesian
- Evaluation: Train/Test Methodologies, Measures, ROC Analysis
- Decision Trees: ID3, C4.5, Stumps, VFDT
- Rule Learning: Ripper, OneR
- Midterm Exam
- Neural Networks: Linear classifiers, Perceptron
- Neural Networks: Multilayer networks, Back-propagation
- Support Vector Machines: Perceptron, Dual representation
- Support Vector Machines: Margins, Kernels, Training, SMO
- Ensemble Methods: Bagging, Boosting
- Ensemble Methods: Random Forests, Voting, Weighting
- Hidden Variables:
*k*-means, Expectation-Maximization

- Project Grading Sheet
- Blackboard (Bb)
- Discussion Board/Forum
- Google Scholar
- Article: Meet Cepheus, the virtually unbeatable poker-playing computer
- Article: Heads-up limit hold'em poker is solved
- Survey: Research Leaders on Data Mining, Data Science, and Big Data key trends, top papers

*
Copyright © 2019 Mark Maloof. All Rights Reserved.
This material may not be published, broadcast, rewritten,
or redistributed.
*