COSC-575: Machine Learning

Spring 2015

Contents

Announcements

Where, When, Who

Class Time: MW 3:30 PM – 4:45 PM
Classroom: REI 284
   
Instructor: Mark Maloof
Office: 325 St. Mary's Hall
Mailbox: 329A St. Mary's Hall
Office Hours: In-person (325 STM): TR 11:00 AM–12:00 PM; online: M 10:30–11:30 AM and W 3:00–4:00 PM; or by appointment. Send me an email to get the Zoom link for online office hours.

Description

This graduate lecture surveys the major research areas of machine learning. Through traditional lectures, programming projects, paper presentations, and research projects, students learn (1) to understand the foundations of machine learning, (2) to comprehend, analyze, and critique papers from the primary literature, (3) to replicate studies described in the primary literature, and (4) to design, conduct, and present their own studies. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include Bayesian decision theory, instance-based approaches, Bayesian methods, decision trees, rule induction, density estimation, linear classifiers, neural networks, support vector machines, ensemble methods, learning theory, evaluation, and applications. Time permitting additional topics include genetic algorithms, unsupervised learning, semi-supervised learning, outlier detection, sequence learning, and reinforcement learning.

Prerequisites: Ideally, students will have taken undergraduate courses in computer science through data structures, but at the very least, students must be able to implement trees and graphs using a high-level programming language, such as C++, Java, Python, or Ruby. Students should have also taken undergraduate courses in mathematics, such as calculus, linear algebra, and probability and statistics.

Primary Text:

Learning Goals

By the end of the semester, students will be able to:

Policies

My course policies are designed to supplement the CS Department's Honor Policy. Unless stated otherwise when I distribute an assignment, the following is the default for all assignments for this course. I've developed my policies from past teaching experiences and from the CS Department's Honor Code at George Mason University.

I am obligated to refer all suspected cases of academic dishonesty by graduate students to the dean of the Graduate School. If you have any questions about these policies or how they apply, please discuss such concerns with me during class, during office hours, or by e-mail.

In my experience, students at Georgetown do honest work. The small percentage of students who have submitted someone else's work as their own did so because they did not manage their time wisely.

Students must follow proper scholarly practice for all submitted work, whether graded or ungraded and whether a draft or final version of a proposal, paper, or program. We must acknowledge our reliance on the work of others through citation.

Students may be quite adept at and knowledgeable about citing and quoting material from traditional sources, such as books and articles. Typically, we do not have cite facts, common math formulae, or expressions of our own ideas, observations, interpretations, and analyses, However, new graduate students in computer science may not realize that formulae, theorems, proofs, algorithms, and programs can require the same treatment as any other form of expression.

For convenience, you do not need to cite the course materials, conversations with me or information you obtain from class lectures and discussions. If you are unsure about what requires citation or what constitutes proper scholarly practice, please ask me during class, during office hours, or by e-mail.

I design my courses and assignments so students have what they need to complete the assignments individually without consulting outside resources. I determine the size of and credit for assignments based on the assumption that the work for them is the result of individual effort using only the course resources and materials. Students who use outside resources to complete assignments may not be eligible for full credit. Students who do not acknowledge their use of outside resources to complete assignments may be in violation of my course policies and the university's policies on academic integrity.

The following list details acceptable and unacceptable practices:

Policies dealing with late projects, cell phones, attendance, and inclement weather.

Assignments and Grading

String Grades::getLetterGrade()
{
  if (grade >= 94)
    return "A";
  else if (grade >= 90)
    return "A-";
  else if (grade >= 87)
    return "B+";
  else if (grade >= 84)
    return "B";
  else if (grade >= 80)
    return "B-";
  else if (grade >= 67)
    return "C";
  else
    return "F";
} // Grades::getLetterGrade

Materials: Readings, Videos, and Links

Schedule

  1. Introduction: Definitions, Areas, History, Paradigms
  2. Bayesian Decision Theory
  3. Instance-based learning: k-NN, kd-trees
  4. Probabilistic learning: MLE, Bayes' Theorem, MAP, naive Bayes, Bayesian naive Bayes
  5. Density estimation: Parametric, Non-parametric, Bayesian
  6. Evaluation: Train/Test Methodologies, Measures, ROC Analysis
  7. Decision Trees: ID3, C4.5, Stumps, VFDT
  8. Rule Learning: Ripper, OneR
  9. Midterm Exam
  10. Neural Networks: Linear classifiers, Perceptron
  11. Neural Networks: Multilayer networks, Back-propagation
  12. Support Vector Machines: Perceptron, Dual representation
  13. Support Vector Machines: Margins, Kernels, Training, SMO
  14. Ensemble Methods: Bagging, Boosting
  15. Ensemble Methods: Random Forests, Voting, Weighting
  16. Hidden Variables: k-means, Expectation-Maximization

Other Interesting Links

Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.