COSC-288: Introduction to Machine Learning

Spring 2016

Announcements
Where, When, Who
Description
Learning Goals
Policies
Assignments and Grading
Materials: Readings, Videos, and Links
Schedule
Other Interesting Links

Announcements

4/20/16: Room for final exam: REI 281 (F 5/6 12:30–2:30 PM).

4/15/16: Posted p5.
3/23/16: Posted p4.
3/20/16: Posted a new version of soybean with the ambiguous example on line 538 removed: soybean-rm538.mff.
2/28/16: Posted p3.
2/13/16: Posted p2.
2/11/16: Posted a Makefile and Makefile screencasts in Other Interesting Links
2/9/16: Extended the deadline for p1 to Sa 2/13.
1/12/16: Turned on the Bb site, created the forum for general discussion on the Discussion Board, and posted p1.
10/6/15: Posted this Web page.

Where, When, Who

Class Time: MW 11:00 AM – 12:15 PM

Classroom: REI 281

Instructor: Mark Maloof

Office: 325 St. Mary's Hall

Mailbox: 329A St. Mary's Hall

Office Hours: In person (325 STM): TR 11:00–12:30 PM; Online: M 10:30–11:30 AM, W 3:00–4:00 PM (and by appointment).

Description

This undergraduate course surveys the major research areas of machine learning. Through traditional lectures and programming projects, students learn (1) to understand the foundations of machine learning, (2) to implement methods of machine learning in a high-level programming language, (2) to comprehend papers from the primary literature, and (4) to design and conduct their own studies. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include instance-based approaches, naive Bayes, decision trees, rule induction, linear classifiers, neural networks, support vector machines, ensemble methods, evaluation, and applications.

Prerequisite: Data Structures (COSC-160).

Primary Text:

Machine Learning: The Art and Science of Algorithms that Make Sense of Data, by Peter Flach [ WWW | CUP | Amazon ]

Learning Goals

By the end of the semester, students will be able to:

explain the main foundations of machine learning
understand and design object-oriented systems for machine learning
analyze algorithms for machine learning in time and space
implement methods of machine learning using a high-level programming language
comprehend scholarly papers from the primary literature
design and conduct empirical studies

Policies

My course policies are designed to supplement the University's Undergraduate Honor System and the CS Department's Honor Policy. Unless stated otherwise when I distribute an assignment, the following is the default for all assignments for this course. I've developed my policies from past teaching experiences and from the CS Department's Honor Code at George Mason University.

I am obligated to refer all suspected cases of academic dishonesty by undergraduate students to Georgetown's Honor Council. If you have any questions about these policies or how they apply, please discuss such concerns with me during class, during office hours, or by e-mail.

In my experience, students at Georgetown do honest work. The small percentage of students who have submitted someone else's work as their own did so because they did not manage their time wisely.

Students must follow proper scholarly practice for all submitted work, whether graded or ungraded and whether a draft or final version of a proposal, paper, or program. We must acknowledge our reliance on the work of others through citation.

Students may be quite adept at and knowledgeable about citing and quoting material from traditional sources, such as books and articles. Typically, we do not have cite facts, common math formulae, or expressions of our own ideas, observations, interpretations, and analyses, However, students new to computer science may not realize that formulae, theorems, proofs, algorithms, and programs can require the same treatment as any other form of expression.

For convenience, you do not need to cite the course materials, conversations with me or information you obtain from class lectures and discussions. If you are unsure about what requires citation or what constitutes proper scholarly practice, please ask me during class, during office hours, or by e-mail.

I design my courses and assignments so students have what they need to complete the assignments individually without consulting outside resources. I determine the size of and credit for assignments based on the assumption that the work for them is the result of individual effort using only the course resources and materials. Students who use outside resources to complete assignments may not be eligible for full credit. Students who do not acknowledge their use of outside resources to complete assignments may be in violation of my course policies and the university's policies on academic integrity.

The following list details acceptable and unacceptable practices:

You can:
- obtain assistance in understanding course materials (textbooks, lecture notes, assignments);
- obtain assistance in learning to use the computing facilities;
- obtain assistance in learning to use special features of a programming language's implementation;
- obtain assistance in determining the syntactic correctness of a particular programming language statement or construct;
- obtain an explanation of a particular syntactic error;
- obtain explanations of compilation or run-time error messages.
You can obtain assistance only from me and the teaching assistants:
- in designing the data structures and algorithms used in your solution;
- in modifying the design of an algorithm or data structure determined to be faulty;
- in implementing your algorithm or data structure in a programming language;
- in correcting a faulty implementation of your algorithm or data structure;
- in determining the semantic correctness of your program;
- in designing an experimental study and interpreting its results.
You can not:
- show or give a copy of your work in any amount or form to another student;
- see or receive a copy of someone else's work in any amount or form;
- attempt to gain access to files other than your own or those that I designate and authorize;
- attempt to reverse engineer routines for automatic grading;
- inspect or retain in your possession another student's work, whether it was given to you by another student, it was found after other student discarded his or her work, or it accidentally came into your possession;
- collaborate in any way with someone else in the design, implementation, or logical revision of an algorithm;
- use or present as your own any algorithm, data structure, or implementation that is not of your own or of my design, or that is not part of the course materials.
- incorporate code written by others (such as can be found on the Internet).

Policies dealing logistics:

You have permission to occasionally take digital photos of the material on the board for your personal use, but you can not post these recordings on the Internet. It is important to understand that all of the course material is covered by copyright, either mine or someone else's.
You should submit all assignments on time. For late projects, there will be a 1% deduction for each minute after the deadline. In the real world, it won't be your grade that decreases. It'll be your stock price.
I grant extensions only to students who have documentation for a medical issue, a family emergency, or an accommodation from the Academic Resource Center. In the cases of a medical issue or a family emergency, it would be best to coordinate with your advising dean since these situations often affect your work in all of your classes.
If you use your laptop for development, you should keep a backup of your projects on a university or department machine (e.g., cs-class) so I can verify that you completed the work for an assignment before the deadline.
You must take the final exam with the section and during the period designated by the Registrar.
It is my job to maintain a constructive learning environment for everyone. Please silence your cell phone and other electronic devices.
If you must miss class, be sure to get the lecture notes from a classmate.
It is fine if you must leave class early, arrive late, or leave the room to answer your phone, but you should do so in a manner that does not disturb your fellow students.
In the case of inclement weather that results in the university's closure, we will meet virtually during normal class times using Zoom.

Assignments and Grading

Programming Projects, 50%

Project 1, assigned W 1/13, due ~~T 2/11~~ Sa 2/13, 10 points
Project 2, assigned T 2/11, due M 2/29, 10 points
Project 3, assigned M 2/29, due W 3/23, 10 points
Project 4, assigned W 3/23, due F 4/15, 10 points
Project 5, assigned W 4/13, due M 5/2, 10 points

Midterm Exam, W 3/2, 20%
Final Exam, F 5/6 12:30–2:30 PM, 30%, REI 281

String Grades::getLetterGrade()
{
  if (grade >= 94)
    return "A";
  else if (grade >= 90)
    return "A-";
  else if (grade >= 87)
    return "B+";
  else if (grade >= 84)
    return "B";
  else if (grade >= 80)
    return "B-";
  else if (grade >= 77)
    return "C+";
  else if (grade >= 74)
    return "C";
  else if (grade >= 70)
    return "C-";
  else if (grade >= 67)
    return "D+";
  else if (grade >= 64)
    return "D";
  else
    return "F";
} // Grades::getLetterGrade

Materials: Readings, Videos, and Links

Perez-Hernandez, D. (28 March 2014). Taking notes by hand benefits recall, researchers find. The Chronicle of Higher Education. (Read if interested.)
Mueller, P. A. and Oppenheimer, D. M. (2014). The pen is mightier than the keyboard: Advantages of longhand over laptop note taking. Psychological Science, 25(6):1159–1168. (Read if interested.)
Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press, Cambridge.
Mitchell, T. M. (1997). Machine learning. McGraw-Hill, New York, NY.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective [electronic resource]. MIT Press, Cambridge, MA.
Gomes, L. (20 Oct 2014). Machine-learning maestro Michael Jordan on the delusions of Big Data and other huge engineering efforts. IEEE Spectrum.
Ramirez, E. et al. (2016) Big data: A tool for inclusion or exclusion? — Understanding the issues FTC Report. Federal Trade Commission: Washington, DC.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM 55(10): 78–87.
Provost, F., Fawcett, T., and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning, 445–453. Morgan Kaufmann, San Francisco, CA.
Fawcett, R. (2006). An introduction to ROC analysis. Pattern Recognition Letters 27(8): 859–928.
Hearst, M.A., et al. (1998). Support vector machines. IEEE Intelligent Systems and their Applications 13(4): 19–28.

Schedule

Introduction: Definitions, Areas, History, Paradigms
Instance-based learning: k-NN
Probabilistic learning: MLE, Bayes' Theorem, naive Bayes
Evaluation: Train/Test Methodologies, Measures, ROC Analysis
Decision Trees: ID3, C4.5, Stumps
Rule Learning: Ripper, OneR
Midterm Exam
Neural Networks: Linear classifiers, Perceptron
Neural Networks: Multilayer networks, Back-propagation
Support Vector Machines: Perceptron, Dual representation
Support Vector Machines: Margins, Kernels, Training, SMO
Ensemble Methods: Bagging, Boosting
Ensemble Methods: Random Forests, Voting, Weighting

Class Time:	MW 11:00 AM – 12:15 PM
Classroom:	REI 281

Instructor:	Mark Maloof
Office:	325 St. Mary's Hall
Mailbox:	329A St. Mary's Hall
Office Hours:	In person (325 STM): TR 11:00–12:30 PM; Online: M 10:30–11:30 AM, W 3:00–4:00 PM (and by appointment).