Synopsis

Algorithms for Natural Language Processing (ANLP) ("ANLP")
Lectures: MW 3:30-4:45, Reiss 283

Human language technologies increasingly help us to communicate with computers and with each other. But every human language is extraordinarily complex, and the diversity seen in languages of the world is massive. Natural language processing (NLP) seeks to formalize and unpack different aspects of a language so computers can approximate human-like language abilities. In this course, we will examine the building blocks that underlie a human language such as English (or Japanese, Arabic, Tamil, or Navajo), and fundamental algorithms for analyzing those building blocks in text data, with an emphasis on the structure and meaning of words and sentences. Students will implement a variety of core algorithms for both rule-based and machine learning methods, and learn how to use computational linguistic datasets such as lexicons and treebanks. Text processing applications such as machine translation, information retrieval, and dialogue systems will be introduced as well.

This course is designed for undergraduates who are comfortable with the basics of discrete probability and possess solid programming skills, including the ability to use basic data structures and familiarity with regular expressions. COSC-160: Data Structures is the prerequisite for CS students, and LING-001 is the prerequisite for Linguistics students. Students that are new to programming or need a refresher are directed to LING-362: Introduction to NLP. The languages of instruction will be English and Python.

Credits: 3

(For administrative reasons, registration is divided into two sections: LING-272 and COSC-272. The content and requirements within the course do not differ by registration section.)

Course Staff and Office Hours

Nathan Schneider
nathan.schneider@georgetown.edu
Office Hours: Mondays (starting Sep. 11) Tuesdays (starting Sep. 26), 5:00-6:00 in Poulton 254, or by appointment

TA: Harry Eldridge
hme7@georgetown.edu
Office Hours: Wednesdays (starting Sep. 6), 1:30-2:30 12:00-1:00 in the basement of St. Mary's

Textbook

You do not need to purchase a textbook. We will use PDF drafts from Speech and Language Processing (3nd ed.) by Dan Jurafsky and James H. Martin ("SLP").

Assessments

The primary assessments in this course are several focused homework assignments (25%), a substantial team project (25%), the final exam (25% of overall grade), and the midterm exam (15%). The remaining 10% will be determined based on participation.

Homework Assignments (25%)

These are focused exercises that reinforce the concepts and algorithms presented in lecture. They will involve some Python coding, some data analysis, and answering some written questions.

Exams (40%)

Both of these are listed in the schedule: the midterm takes place during a class period and the final is scheduled per university policy. The final will be cumulative, but place greater emphasis on material covered in the second half of the course. Calculators/computers are not permitted in either exam.

Team Project (25%)

A substantial interdisciplinary team project, defined and executed by the team members with guidance from the course staff, will serve as an opportunity to put the ideas from the course into practice. Details will be forthcoming at a later date.

Participation (10%)

This credit comes from activities such as:

  • Short pop quizzes in class.
  • Speaking up in class.
  • Meeting individually with the instructor between the first and second class to gauge expectations.
  • Giving a small presentation in one class session (details TBA).

Communication

A public course website will be maintained with the syllabus, schedule, and lecture slides: http://people.cs.georgetown.edu/cosc572/ a.k.a. http://tiny.cc/enlp

The Canvas platform will be used to host content accessible only to members of the course. It provides a discussion forum, a way to submit coursework, and other tools. Log into Canvas at https://georgetown.instructure.com/ using your NetID. Students are automatically added when they are enrolled in the course. (The course name in Canvas is displayed as "COSC-572-01", but it actually includes all sections of the course, including LING-572.)

The Canvas discussion forum is the recommended virtual venue for asking and answering course-related questions. Instructors will monitor the forum and post replies from time to time, but we cannot promise immediate attention to every question.

The most direct way to contact instructors is through email.

Computing Resources

Students will be granted remote login (SSH) access to a Unix server with the Anaconda distribution of Python 3.5 installed. If you wish to run Python code on your own machine, you are strongly encouraged to install the same Anaconda distribution to avoid compatibility hassles. It is available for Windows, Mac OS X, and Linux machines.

Attendance and Late Policy

In general, students are expected to attend all classes and to complete all assignments on time. Absences may have an adverse effect on grades in a course, up to and including failure.

That being said, we understand that circumstances may arise preventing you from attending class. Please email the instructors ASAP to communicate any expected absences. For example, inform us at the beginning of the semester about planned religious observances or athletic travel.

Late assignments are subject to a penalty of 25% of the grade for each day past the deadline. At the discretion of the instructors, a deadline may be adjusted for a student if there are special circumstances communicated to the instructors well in advance. 11th-hour requests for an extension to an assignment are unlikely to be granted absent truly exceptional circumstances.

Students who miss multiple classes due to prolonged illness should seek medical care and provide documentation of such to the Dean’s Office, which will communicate with the student’s professors. A prolonged absence may necessitate the student’s withdrawal from the course or from the University for the semester.

More information and resources:

Academic Integrity

In this course, you will be asked to participate at times as an individual and at times working in a group. Exams should be completed entirely on your own. Exam questions should not be posted online or disclosed to other students who have not yet taken the exam (and the same goes for official answers to homework assignments).

For homework assignments, you are expected to write code/text and perform analyses yourself unless directed otherwise. I.e., don't copy solutions from other students or share yours with them. But you are encouraged to discuss concepts and implementation stumbling blocks with fellow students, within reason. The online discussion forum and office hours are good opportunities for this.

Part of treating others with respect is giving appropriate credit for ideas and scholarly works (including code). If you consult with other students on an assignment, report this in the work that you turn in. If in your code you use a library or implementation from another source, indicate that as well (minimally by including a URL in a comment). Instructors reserve the right to request an oral explanation of answers.

Course projects are intended to be highly collaborative, and the final project writeup should include a synopsis of who has contributed what. Version control software should be used in development for the final project. (Version control can also be used for homework assignments, provided that you ensure that other students do not have access to your solutions.)

In research writing, it is important to give credit to other research that provides specific foundations to your work, as well as to published work that is closely related. If you discuss ideas/information from a publication, be sure to cite it; if you reuse the specific phrasing of other work, use quotation marks. Knowing when and how to give credit can be tricky at times, so when in doubt, ask!

For more information:

Notice Regarding Sexual Misconduct

Please know that as a faculty member I am committed to supporting survivors of sexual misconduct, including relationship violence, sexual harassment and sexual assault. University policy also requires me to report any disclosures about sexual misconduct to the Title IX Coordinator, whose role is to coordinate the University’s response to sexual misconduct.

Georgetown has a number of fully confidential professional resources who can provide support and assistance to survivors of sexual assault and other forms of sexual misconduct. These resources include:

Jen Schweer, MA, LPC
Associate Director of Health Education Services for Sexual Assault Response and Prevention
(202) 687-0323
jls242@georgetown.edu

Erica Shirley, Trauma Specialist
Counseling and Psychiatric Services (CAPS)
(202) 687-6985
els54@georgetown.edu

More information about campus resources and reporting sexual misconduct can be found at http://sexualassault.georgetown.edu.