Synopsis

Empirical Methods in Natural Language Processing ("ENLP")
Lectures: Tu/Th 11:00am-12:15pm (Eastern Time) @ White-Gravenor 211

Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

Credits: 3

Prerequisites: Linguistics students are recommended to complete Intro to NLP (LING-362) before enrolling in this course.

(For administrative reasons, registration is divided into two sections: LING-572 and COSC-572. The content and requirements within the course do not differ by registration section.)

Course Staff and Office Hours

Nathan Schneider
nathan.schneider@georgetown.edu
Office Hours: Tuesdays, 4-5 in STM 315H or via Zoom

TAs:

  • Tatsuya Aoyama
    ta571@georgetown.edu
  • Shabnam Behzad
    sb1796@georgetown.edu
  • TA Office Hours: Thursdays, 2-3 in STM G36 (B floor) unless announced otherwise

Course Modality & Health Precautions

Course sessions are expected to take place in person unless there is a change in university policy. Your attendance is important for the class to operate smoothly, including discussions and activities (the policy regarding absences is discussed below).

Please familiarize yourself with Georgetown's policies for being on campus and health resources, starting with this FAQ. Of particular note:

  • Consider wearing a mask in class even if not required, especially during periods of high transmission of COVID and other respiratory illnesses on campus.
  • If you are sick, do not come to class. If you have slight symptoms (e.g., the sniffles) and it may be something contagious, but you are feeling well enough to participate, notify the instructors that you wish to join class via Zoom. (Also get tested.)

Textbook

You do not need to purchase a textbook. We will use PDF drafts from Speech and Language Processing (3nd ed.) by Dan Jurafsky and James H. Martin ("SLP").

Assessments

The primary assessments in this course are several focused homework assignments (40%), a substantial team project (30%), and graded quizzes (15%). The remaining 15% will be determined based on participation.

Homework Assignments (40%)

These are focused exercises that reinforce the concepts and algorithms presented in lecture. They will involve some Python coding, some data analysis, and answering some written questions.

Graded Quizzes (15%)

Instead of full-fledged exams, in Spring 2021 there will be two smaller graded quizzes assigned outside of class time. These will assess knowledge of core algorithmic, linguistic, and mathematical concepts. Any quizzes given in class will be for practice (see Participation).

Team Project (30%)

A substantial interdisciplinary team project, defined and executed by the team members with guidance from the course staff, will serve as an opportunity to put the ideas from the course into practice. Details will be forthcoming at a later date.

Participation (15%)

This credit comes from activities such as:

  • Short pop quizzes in class.
  • Speaking up in class.
  • Giving a small presentation in one class session (details TBA).

Communication

A public course website will be maintained with the syllabus, schedule, and lecture slides: http://people.cs.georgetown.edu/cosc572/

The Canvas platform will be used to host content accessible only to members of the course. It provides a discussion forum, a way to submit coursework, and other tools. Log into Canvas at https://georgetown.instructure.com/ using your NetID. Students are automatically added when they are enrolled in the course. (The course name in Canvas is displayed as "COSC-572-01", but it actually includes all sections of the course, including LING-572.)

Some course meetings will take place virtually via Zoom, via the link provided in Canvas. Some of these sessions may be recorded for use within the course. These recordings should not be shared beyond the course without permission.

The Canvas discussion forum is the recommended virtual venue for asking and answering course-related questions. Instructors will monitor the forum and post replies from time to time, but we cannot promise immediate attention to every question.

The most direct way to contact instructors is through email. For most inquiries, including the main instructor and TAs in the email will elicit the fastest response.

Computing Resources

Students will be granted remote login (SSH) access to a Unix server with the Anaconda distribution of Python 3.5 installed. If you wish to run Python code on your own machine, you are strongly encouraged to install the same Anaconda distribution to avoid compatibility hassles. It is available for Windows, Mac OS X, and Linux machines.

Attendance and Late Policy

In general, students are expected to attend all classes and to complete all assignments on time. Absences may have an adverse effect on grades in a course, up to and including failure.

That being said, we understand that circumstances may arise preventing you from attending class. In light of COVID, we will make every effort to accommodate student needs that are communicated to the instructors, for example, providing for asynchronous participation in the course by students whose circumstances prevent synchronous attendance over Zoom. Please email the instructors ASAP to communicate any expected absences. For example, inform us at the beginning of the semester about planned religious observances or athletic travel.

Late assignments are subject to a penalty of 15% of the grade for each day past the deadline. At the discretion of the instructors, a deadline may be adjusted for a student if there are special circumstances communicated to the instructors well in advance. 11th-hour requests for an extension to an assignment are unlikely to be granted absent truly exceptional circumstances.

Students who miss multiple classes due to prolonged illness should seek medical care and provide documentation of such to the Dean’s Office, which will communicate with the student’s professors. A prolonged absence may necessitate the student’s withdrawal from the course or from the University for the semester.

More information and resources:

Academic Integrity

In this course, you will be asked to participate at times as an individual and at times working in a group. Exams should be completed entirely on your own. Exam questions should not be posted online or disclosed to other students who have not yet taken the exam (and the same goes for official answers to homework assignments).

For homework assignments, you are expected to write code/text and perform analyses yourself unless directed otherwise. I.e., don't copy solutions from other students or share yours with them. But you are encouraged to discuss concepts and implementation stumbling blocks with fellow students, within reason. The online discussion forum and office hours are good opportunities for this.

Part of treating others with respect is giving appropriate credit for ideas and scholarly works (including code). If you consult with other students on an assignment, report this in the work that you turn in. If in your code you use a library or implementation from another source, indicate that as well (minimally by including a URL in a comment). Do not generate new content with prompt-based AI tools like ChatGPT or CodePilot without permission from instructors unless specifically allowed by the assignment. (Using, for example, Grammarly as a language aid is OK.) Instructors reserve the right to request an oral explanation of answers.

Course projects are intended to be highly collaborative, and the final project writeup should include a synopsis of who has contributed what. Version control software should be used in development for the final project. (Version control can also be used for homework assignments, provided that you ensure that other students do not have access to your solutions.)

In research writing, it is important to give credit to other research that provides specific foundations to your work, as well as to published work that is closely related. If you discuss ideas/information from a publication, be sure to cite it; if you reuse the specific phrasing of other work, use quotation marks. Knowing when and how to give credit can be tricky at times, so when in doubt, ask!

For more information:

Title IX/Sexual Misconduct

Georgetown University and its faculty are committed to supporting survivors and those impacted by sexual misconduct, which includes sexual assault, sexual harassment, relationship violence, and stalking. Georgetown requires faculty members, unless otherwise designated as confidential, to report all disclosures of sexual misconduct to the University Title IX Coordinator or a Deputy Title IX Coordinator. If you disclose an incident of sexual misconduct to a professor in or outside of the classroom (with the exception of disclosures in papers), that faculty member must report the incident to the Title IX Coordinator, or Deputy Title IX Coordinator. The coordinator will, in turn, reach out to the student to provide support, resources, and the option to meet. [Please note that the student is not required to meet with the Title IX coordinator.] More information about reporting options and resources can be found on the Sexual Misconduct Website: https://sexualassault.georgetown.edu/resourcecenter

If you would prefer to speak to someone confidentially, Georgetown has a number of fully confidential professional resources that can provide support and assistance. These resources include:

  • Health Education Services for Sexual Assault Response and Prevention: confidential email sarp@georgetown.edu
  • Counseling and Psychiatric Services (CAPS): 202.687.6985 or after hours, call (833) 960-3006 to reach Fonemed, a telehealth service; individuals may ask for the on-call CAPS clinician

More information about reporting options and resources can be found on the Sexual Misconduct Website.