COSC-288: Introduction to Machine Learning

Spring 2021

Contents

Announcements

Where, When, Who

Class Time: TR 11 AM – 12:15 PM ET
Classroom: Ha!
   
Instructor: Mark Maloof
Office: 325 St. Mary's Hall
Mailbox: 329A St. Mary's Hall
Office Hours: In-person (325 STM): TR 11:00 AM–12:00 PM; online: M 10:30–11:30 AM and W 3:00–4:00 PM; or by appointment. Send me an email to get the Zoom link for online office hours.

Description

This undergraduate course surveys the major research areas of machine learning focusing on classification. Through traditional lectures and programming projects, students learn (1) to understand the foundations of machine learning, (2) to design and implement methods of machine learning, (3) to evaluate methods of machine learning, and (4) to conduct empirical evaluations of multiple methods of machine learning. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include instance-based approaches, naive Bayes, decision trees, rule induction, linear classifiers, support vector machines, neural networks, ensemble methods, evaluation, and applications. Students complete five programming projects using Java. There are midterm and final exams.

Prerequisites: Advanced Programming (COSC-150) and Data Structures (COSC-160).

Primary Text:

Learning Goals

By the end of the semester, students will be able to:

Policies

My course policies are designed to supplement the University's Undergraduate Honor System and the CS Department's Honor Policy. Unless stated otherwise when I distribute an assignment, the following is the default for all assignments for this course. I have developed my policies from past teaching experiences and from the CS Department's Honor Code at George Mason University.

I am obligated to refer all suspected cases of academic dishonesty by undergraduate and master's students to Georgetown's Honor Council. I am obligated to refer all suspected cases of academic dishonesty by doctoral students to the dean of the Graduate School. If you have any questions about these policies or how they apply, please discuss such concerns with me during class, during office hours, by e-mail, or using the class's discussion board.

In my experience, students at Georgetown do honest work. The very small percentage of students who have submitted someone else's work as their own did so because they did not manage their time wisely.

Students must always follow proper scholarly practice for all submitted work, whether graded or ungraded and whether a draft or final version of a proposal, paper, program, or problem set. As scholars, we must acknowledge our reliance on the work of others through citation. You can never submit someone else's work as your own without proper attribution. The University assumes that students learned how to properly cite material at their previous institutions. If this is not the case, please let me know.

Indeed, students may be quite adept at and knowledgeable about citing and quoting material from traditional sources, such as books and articles. Typically, we do not have cite facts, common math formulae, or expressions of our own ideas, observations, interpretations, and analyses, However, students new to computer science may not realize that theorems, proofs, algorithms, programs, and code fragments may require the same treatment as any other form of expression.

For convenience, you do not need to cite the course materials, which are the syllabus, sources linked from the syllabus, the course textbook, the class lectures and discussion, posts on the discussion board, and conversations with me and the course assistants. You must, however, cite the use of any resource that is not part of the course materials. Note that “the textbook” does not extend to the textbook's Web site, its contents, lecture slides, solution manuals, code repositories, or any other material related to the textbook. The syllabus links to any such material pertinent for the class. If you are unsure about what requires citation or what constitutes proper scholarly practice, please ask me during class, during office hours, by e-mail, or using the discussion board.

I design my courses and assignments so students have what they need to complete the assignments individually without consulting outside resources. I determine the size of and credit for assignments based on the assumption that the work for them is the result of individual effort using only the course resources and materials. Students who use outside resources to complete assignments may not be eligible for full credit. Students who do not acknowledge their use of outside resources to complete assignments may be in violation of my course policies and the university's policies on academic integrity.

The materials that I create and use for my courses (“Course Materials”) are my intellectual property. You may not disseminate or reproduce them in any form for public distribution (e.g., sale, exchange, etc.) without my written permission. Course Materials include all written or electronic documents and materials that I provide, including but not limited to syllabi, current and past assessments and their solutions (e.g., exams, homeworks, projects, problem sets, etc.), and presentations such as lectures, videos, slides, etc. Course Materials may only be used by students enrolled in the course for academic (i.e., course-related) purposes. Furthermore, your solutions to assessments are derivative works of my copyrighted material and are therefore subject to that protection because they necessarily incorporate my protected expression. Consequently, you may not further disseminate or reproduce in any form for distribution (e.g., uploading to websites, sale, exchange, etc.) your solutions to assessments.

Published course readings (book chapters, articles, reports, etc.) available in Canvas are copyrighted material. I make these works available to students through licensed databases or fair use. They are protected by copyright law, and may not be further disseminated or reproduced in any form for distribution (e.g., uploading to websites, sale, exchange, etc.) without permission of the copyright owner.

You can find more information about intellectual property and copyright here: https://www.library.georgetown.edu/copyright. You can find more information about computer acceptable use policy and intellectual property here: https://security.georgetown.edu/it-policies-procedures/computer-systems-aup.

Copyright issues aside, if you post your solutions to assessments on the Internet, it is possible that students in future classes will find your solutions and submit them as their own work without attribution. Naturally, students who do so violate my course policies, and the Honor Council will with high probability find them in violation and sanction them. The Honor Council may also find you in violation because you facilitated cheating and violated copyright law and the Web site's terms of service.

I understand that it is often important for securing a job or an internship for students to provide prospective employers with a portfolio of their work. I recommend that students devise a scheme for doing so that does not violate copyright law, does not violate the terms of service of the site on which you have posted material protected by copyright, and does not facilitate cheating.

The following list details acceptable and unacceptable practices:

Policies dealing logistics:

For flexibility, each student has two Get-Out-Of-Jail-Free cards—virtual cards, of course—and 36 grace hours. Students can use their goojf cards to avoid a homework without a grading penalty, obtain one additional submission to Autoloab for a project, or avoid answering one question on the midterm or final exam. Students can use their grace hours to obtain extensions on homeworks, projects, and the midterm exam. Grace hours can not be used for an extension on the final project (p5) or the final exam because of university rules and grading deadlines.

To use the cards or hours for an assignment, students must post a private message on Piazza before 4:30 PM ET on the day the assignment is due. There can be only one request of each type per assignment. For example, students can not use two cards for a project, and they can not make two requests to use their grace hours on an assignment. They can, however, use one card and their grace hours on a project or the midterm exam.

For grace hours, students must specify the number of hours they will use. If they submit their project beyond the new deadline, it will be late and the usual deduction will apply. Once a request to use a goojf card or grace hours has been made, it can not be retracted. Unused submissions and hours can not be reclaimed. Cards and hours can not be transferred to other students. Note that cards and grace hours are intended for use in normal circumstances. Students who experience extraordinary setbacks, such as medical or family emergencies, may qualify for an extension without using their grace hours. Students in these situations should contact the instructor.

Schedule

  1. Introduction: Definitions, Areas, History, Paradigms
  2. Instance-based learning: k-NN
  3. Probabilistic learning: MLE, Bayes' Theorem, naive Bayes
  4. Evaluation: Train/Test Methodologies, Measures, ROC Analysis
  5. Decision Trees: ID3, C4.5, Stumps
  6. Rule Learning: Ripper, OneR
  7. Midterm Exam
  8. Neural Networks: Linear classifiers, Perceptron
  9. Neural Networks: Multilayer networks, Back-propagation
  10. Support Vector Machines: Perceptron, Dual representation
  11. Support Vector Machines: Margins, Kernels, Training, SMO
  12. Ensemble Methods: Bagging, Boosting
  13. Ensemble Methods: Random Forests, Voting, Weighting
  14. Ethics, Bias, and Fairness

Assignments and Grading

String Grades::getLetterGrade()
{
  if (grade >= 94)
    return "A";
  else if (grade >= 90)
    return "A-";
  else if (grade >= 87)
    return "B+";
  else if (grade >= 84)
    return "B";
  else if (grade >= 80)
    return "B-";
  else if (grade >= 77)
    return "C+";
  else if (grade >= 74)
    return "C";
  else if (grade >= 70)
    return "C-";
  else if (grade >= 67)
    return "D+";
  else if (grade >= 64)
    return "D";
  else
    return "F";
} // Grades::getLetterGrade

I use automatic grading routines to assign an initial grade for the projects. It is important to emphasize that the grade you obtain from Autolab is an initial grade and may not be your final grade. There are many important aspects of a program that are difficult or impossible to assess using automatic grading routines. For example, automatic grading routines can not determine if you have written proper documentation. They can not easily assess if an implementation of an operation is optimally efficient. As a consequence, I start with the initial grade you obtain from Autolab and take further deductions if necessary.

For complete implementations, I use the following distribution as a guide:

Notice that an implementation consisting entirely method stubs would obtain an initial grade of 60%. Such an implementation is incomplete, and would be subject to further deductions based on the effort required to implement the required operations.

In addition to the above, the following deductions may be taken if applicable:

Materials: Readings, Videos, and Links

Other Interesting Links

Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.