COSC-575: Machine Learning

Fall 2018

Announcements
Where, When, Who
Description
Learning Goals
Policies
Schedule
Assignments and Grading
Materials: Readings, Videos, and Links
Other Interesting Links

Announcements

12/13/18: Posted room for final exam: Car Barn 203.

11/20/18: Posted p5.
11/6/18: Posted p4.
10/15/18: Posted p3.
10/9/18: Changed the due date for p2 to 10/17.
9/26/18: Posted p2.
9/4/18: Updated the classroom to WAL 491
9/4/18: Updated the classroom to ICC 108
8/30/18: Posted p1.
8/29/18: Updated the classroom to WAL 498
8/26/18: Set exam and assignment dates
4/2/18: Created this Web page.

Where, When, Who

Class Time: TR 9:30–10:45 AM

Classroom: ~~ICC 119~~ ~~WAL 498~~ ~~ICC 108~~ WAL 491

Instructor: Mark Maloof

Office: 325 St. Mary's Hall

Mailbox: 329A St. Mary's Hall

Office Hours: None for 24–25 academic year.

Description

This graduate lecture surveys the major research areas of machine learning focusing on classification. Through traditional lectures and programming projects, students learn (1) to understand the foundations of machine learning, (2) to design and implement methods of machine learning, (3) to evaluate methods of machine learning, and (4) to conduct empirical evaluations of multiple methods of machine learning. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include Bayesian decision theory, instance-based approaches, Bayesian methods, decision trees, rule induction, density estimation, linear classifiers, support vector machines, neural networks, ensemble methods, learning theory, evaluation, and applications. Students complete five programming projects using Java. There are midterm and final exams.

Prerequisites: Students should have taken undergraduate courses in computer science through data structures; at the very least, students must be able to implement trees and graphs in a high-level object-oriented programming language. Students should have also taken undergraduate courses in mathematics, such as calculus, linear algebra, and probability and statistics. Students will use Java to complete the projects for this course.

Primary Text:

Machine Learning: The Art and Science of Algorithms that Make Sense of Data, by Peter Flach [ WWW | CUP | Amazon ]

Learning Goals

By the end of the semester, students will be able to:

explain the main foundations of machine learning
understand and design object-oriented systems for machine learning
implement methods of machine learning using a high-level programming language
conduct performance evaluations of methods of machine learning
design and conduct empirical studies

Policies

My course policies are designed to supplement the CS Department's Honor Policy. Unless stated otherwise when I distribute an assignment, the following is the default for all assignments for this course. I've developed my policies from past teaching experiences and from the CS Department's Honor Code at George Mason University.

I am obligated to refer all suspected cases of academic dishonesty by master's students to the Honor Council. I am obligated to refer all suspected cases of academic dishonesty by doctoral students to the dean of the Graduate School. If you have any questions about these policies or how they apply, please discuss such concerns with me during class, during office hours, or by e-mail.

In my experience, students at Georgetown do honest work. The small percentage of students who have submitted someone else's work as their own did so because they did not manage their time wisely.

Students must follow proper scholarly practice for all submitted work, whether graded or ungraded and whether a draft or final version of a proposal, paper, or program. We must acknowledge our reliance on the work of others through citation.

Students may be quite adept at and knowledgeable about citing and quoting material from traditional sources, such as books and articles. Typically, we do not have cite facts, common math formulae, or expressions of our own ideas, observations, interpretations, and analyses, However, new graduate students in computer science may not realize that formulae, theorems, proofs, algorithms, and programs can require the same treatment as any other form of expression.

For convenience, you do not need to cite the course materials, conversations with me or information you obtain from class lectures and discussions. If you are unsure about what requires citation or what constitutes proper scholarly practice, please ask me during class, during office hours, or by e-mail.

I design my courses and assignments so students have what they need to complete the assignments individually without consulting outside resources. I determine the size of and credit for assignments based on the assumption that the work for them is the result of individual effort using only the course resources and materials. Students who use outside resources to complete assignments may not be eligible for full credit. Students who do not acknowledge their use of outside resources to complete assignments may be in violation of my course policies and the university's policies on academic integrity.

The following list details acceptable and unacceptable practices:

You can:
- obtain assistance in understanding course materials (textbooks, lecture notes, assignments);
- obtain assistance in learning to use the computing facilities;
- obtain assistance in learning to use special features of a programming language's implementation;
- obtain assistance in determining the syntactic correctness of a particular programming language statement or construct;
- obtain an explanation of a particular syntactic error;
- obtain explanations of compilation or run-time error messages.
You can obtain assistance only from me and the teaching assistants:
- in designing the data structures and algorithms used in your solution;
- in modifying the design of an algorithm or data structure determined to be faulty;
- in implementing your algorithm or data structure in a programming language;
- in correcting a faulty implementation of your algorithm or data structure;
- in determining the semantic correctness of your program;
- in designing an experimental study and interpreting its results.
You can not:
- show or give a copy of your work in any amount or form to another student;
- see or receive a copy of someone else's work in any amount or form;
- attempt to gain access to files other than your own or those that I designate and authorize;
- attempt to reverse engineer routines used for automatic grading;
- inspect or retain in your possession another student's work, whether it was given to you by another student, it was found after other student discarded his or her work, or it accidentally came into your possession;
- collaborate in any way with someone else in the design, implementation, or logical revision of an algorithm;
- use or present as your own any algorithm, data structure, or implementation that is not of your own or of my design, or which is not part of the course's required reading. If you modify any procedure which is presented in the course's texts that is not specifically mentioned in class or covered in reading assignments, then a citation with page number must be given;
- incorporate code written by others (such as can be found on the Internet).

Policies dealing logistics:

You have permission to occasionally take digital photos of the material on the board for your personal use, but you can not post these recordings on the Internet. It is important to understand that all of the course material is covered by copyright, either mine or someone else's.
You should submit all assignments on time. For late projects, there will be a 1% deduction for each minute after the deadline. In the real world, it won't be your grade that decreases. It'll be your stock price.
I grant extensions only to students who have documentation for a medical issue, a family emergency, or an accommodation from the Academic Resource Center. In the cases of a medical issue or a family emergency, it would be best to coordinate with your advising dean since these situations often affect your work in all of your classes.
If you use your laptop for development, you should keep a backup of your projects on a university or department machine (e.g., cs-class) so I can verify that you completed the work for an assignment before the deadline.
You must take the final exam with the section and during the period designated by the Registrar.
It is my job to maintain a constructive learning environment for everyone. Please silence your cell phone and other electronic devices.
If you must miss class, be sure to get the lecture notes from a classmate.
It is fine if you must leave class early, arrive late, or leave the room to answer your phone, but you should do so in a manner that does not disturb your fellow students.
In the case of inclement weather that results in the university's closure, we will meet virtually during normal class times using Zoom.

Schedule

Introduction: Definitions, Areas, History, Paradigms
Bayesian Decision Theory
Instance-based learning: k-NN, kd-trees
Probabilistic learning: MLE, Bayes' Theorem, MAP, naive Bayes, Bayesian naive Bayes
Density estimation: Parametric, Non-parametric, Bayesian
Evaluation: Train/Test Methodologies, Measures, ROC Analysis
Decision Trees: ID3, C4.5, Stumps, VFDT
Rule Learning: Ripper, OneR
Midterm Exam
Neural Networks: Linear classifiers, Perceptron
Neural Networks: Multilayer networks, Back-propagation
Support Vector Machines: Perceptron, Dual representation
Support Vector Machines: Margins, Kernels, Training, SMO
Ensemble Methods: Bagging, Boosting
Ensemble Methods: Random Forests, Voting, Weighting
Hidden Variables: k-means, Expectation-Maximization

Assignments and Grading

Programming Projects, 50%

Project 1, assigned R 8/30, due R 9/27 @ 5 PM, 10 points
Project 2, assigned R 9/27, due ~~M 10/15~~ W 10/17 @ 5 PM, 10 points
Project 3, assigned M 10/15, due T 11/6 @ 5 PM, 10 points
Project 4, assigned T 11/6, due W 11/21 @ 11:59 PM, 10 points
Project 5, assigned W 11/21, due M 12/10 @ 11:59 PM, 10 points

Midterm Exam, R 10/18, 20%
Final Exam, R 12/20 9–11am in CBN 203, 30%

String Grades::getLetterGrade()
{
  if (grade >= 94)
    return "A";
  else if (grade >= 90)
    return "A-";
  else if (grade >= 87)
    return "B+";
  else if (grade >= 84)
    return "B";
  else if (grade >= 80)
    return "B-";
  else if (grade >= 67)
    return "C";
  else
    return "F";
} // Grades::getLetterGrade

I use automatic grading routines to assign an initial grade for the projects. It is important to emphasize that the grade you obtain from Autolab is an initial grade and may not be your final grade. There are many important aspects of a program that are difficult or impossible to assess using automatic grading routines. For example, automatic grading routines can not determine if you have written proper documentation. They can not easily assess if an implementation of an operation is optimally efficient. As a consequence, I start with the initial grade you obtain from Autolab and take further deductions if necessary.

For complete implementations, I use the following distribution as a guide:

40%: Compiles against the autograder
20%: Executes without failure using the autograder
40%: Autograder unit tests
10%: Internal documentation, if required

Purpose of classes and methods explained in documentation comments
Purpose, range, and meaning of identifiers explained, where needed
Complex flow of control explained

10%: Style and formatting

Nested indentation for loops and conditionals
All class, method, and function headers emphasized
Comments set off from code
Vertical alignment of comments, where appropriate
White space between blocks of code (and comments)
Mnemonic identifier names

80%: Algorithm and Implementation

Correct implementation of the operations
Proper object-oriented design and implementation
Appropriate error checking and diagnostic messages
Appropriate data and object types
Correct, clean, organized output format

Notice that an implementation consisting entirely method stubs would obtain an initial grade of 60%. Such an implementation is incomplete, and would be subject to further deductions based on the effort required to implement the required operations.

In addition to the above, the following deductions may be taken if applicable:

20%: Does not compile on cs-class
1–10%: Incomplete or improper submission
1–5%: My effort for fixing any minor issue
1–5%: Inefficiently implemented routine
1% per minute: Late deduction

Materials: Readings, Videos, and Links

Autolab, for submitting projects
Piazza, for online discussion
Canvas, for document distribution and for submitting projects if Autolab is down

Perez-Hernandez, D. (28 March 2014). Taking notes by hand benefits recall, researchers find. The Chronicle of Higher Education. (Read if interested.)
Mueller, P. A. and Oppenheimer, D. M. (2014). The pen is mightier than the keyboard: Advantages of longhand over laptop note taking. Psychological Science, 25(6):1159–1168. (Read if interested.)
Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press, Cambridge.
Mitchell, T. M. (1997). Machine learning. McGraw-Hill, New York, NY.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective [electronic resource]. MIT Press, Cambridge, MA.
Gomes, L. (20 Oct 2014). Machine-learning maestro Michael Jordan on the delusions of Big Data and other huge engineering efforts. IEEE Spectrum.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM 55(10): 78–87.
Duda, R. O., and Hart, P. E. and Stork, D. G. (2000). Pattern classification. John Wiley & Sons, New York, NY.
Slides: Bayesian Decision Theory.
JASON (2017). Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD. Technical Report JSR-16-Task-003. The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102-7508.
Provost, F., Fawcett, T., and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning, 445–453. Morgan Kaufmann, San Francisco, CA.
Fawcett, R. (2006). An introduction to ROC analysis. Pattern Recognition Letters 27(8): 859–928.
Slides: Kernel-Density Estimation.
Domingos, P. and Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 71–80. ACM Press, New York, NY.
Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, 115–123. Morgan Kaufmann, San Francisco, CA.
Goodfellow, I., Bengio, Y. and Courville, A. (2017). Deep feedforward networks. In Deep Learning. MIT Press, Cambridge, MA.
Rojas, R. (1996). The backpropagation algorithm. In Neural Networks. Springer, Berlin-Heidelberg.
Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 1097–1105. Curran Associates, Inc., Red Hook, NY.
Montavon, G., Orr, G. B., and Müller, K.-R. (2012). Neural networks: Tricks of the Trade, 2nd Edition. Lecture Notes in Computer Science, Volume 7700. Springer, Berlin-Heidelberg.
LeCun, Y.A., Bottou, L., Orr, G. B., and Müller, K.-R. (2012). Efficient BackProp In Neural networks: Tricks of the Trade, 9–48. Lecture Notes in Computer Science, Volume 7700. Springer, Berlin-Heidelberg.
Hearst, M.A., et al. (1998). Support vector machines. IEEE Intelligent Systems and their Applications 13(4): 19–28.
Müller, K.-R., et al. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12(2): 181–201.

Class Time:	TR 9:30–10:45 AM
Classroom:	~~ICC 119~~ ~~WAL 498~~ ~~ICC 108~~ WAL 491

Instructor:	Mark Maloof
Office:	325 St. Mary's Hall
Mailbox:	329A St. Mary's Hall
Office Hours:	None for 24–25 academic year.