Synopsis
Empirical Methods in Natural Language Processing ("ENLP")
Lectures: MW 3:30-4:45, White-Gravenor 213
Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.
This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.
Credits: 3
Prerequisites: Linguistics students are recommended to complete Intro to NLP (LING-362) before enrolling in this course.
(For administrative reasons, registration is divided into three sections: LING-572, COSC-572-01, and COSC-572-02. The content and requirements within the course do not differ by registration section.)
Course Staff and Office Hours
Nathan Schneider
nathan.schneider@georgetown.edu
New Office Hours: Mondays, 5:00-6:00 in 226 Poulton Hall, or by appointment
TA: James Maguire
jrm346@georgetown.edu
New Office Hours: Fridays, 1:00-2:00 in 220 Poulton Hall
Textbook
The required textbook is Speech and Language Processing (2nd ed.) by Dan Jurafsky and James H. Martin ("SLP"). Copies have been ordered through the campus bookstore. Supplementary readings will come from the draft 3rd ed. chapters and other online sources.
Assessments
The primary assessments in this course are several focused homework assignments (25%), a substantial team project (25%), the final exam (25% of overall grade), and the midterm exam (15%). The remaining 10% will be determined based on participation.
Homework Assignments (25%)
These are focused exercises that reinforce the concepts and algorithms presented in lecture. They will involve some Python coding, some data analysis, and answering some written questions.
Exams (40%)
Both of these are listed in the schedule: the midterm takes place during a class period and the final is scheduled per university policy. The final will be cumulative, but place greater emphasis on material covered in the second half of the course. Calculators/computers are not permitted in either exam.
Team Project (25%)
A substantial interdisciplinary team project, defined and executed by the team members with guidance from the course staff, will serve as an opportunity to put the ideas from the course into practice. Details will be forthcoming at a later date.
Participation (10%)
This credit comes from activities such as:
- Short pop quizzes in class.
- Speaking up in class.
- Meeting individually with the instructor between the first and second class to gauge expectations.
- Giving a small presentation in one class session (details TBA).
Communication
A public course website will be maintained with the syllabus, schedule, and lecture slides: http://people.cs.georgetown.edu/cosc572/ a.k.a. http://tiny.cc/enlp
The Canvas platform will be used to host content accessible only to members of the course. It provides a discussion forum, a way to submit coursework, and other tools. Log into Canvas at https://georgetown.instructure.com/ using your NetID. Students are automatically added when they are enrolled in the course. (The course name in Canvas is displayed as "COSC-572-01", but it actually includes all sections of the course, including LING-572.)
The Canvas discussion forum is the recommended virtual venue for asking and answering course-related questions. Instructors will monitor the forum and post replies from time to time, but we cannot promise immediate attention to every question.
The most direct way to contact instructors is through email.
Computing Resources
Students will be granted remote login (SSH) access to a Unix server with the Anaconda distribution of Python 3.5 installed. If you wish to run Python code on your own machine, you are strongly encouraged to install the same Anaconda distribution to avoid compatibility hassles. It is available for Windows, Mac OS X, and Linux machines.
Attendance and Late Policy
In general, students are expected to attend all classes and to complete all assignments on time. Absences may have an adverse effect on grades in a course, up to and including failure.
That being said, we understand that circumstances may arise preventing you from attending class. Please email the instructors ASAP to communicate any expected absences. For example, inform us at the beginning of the semester about planned religious observances or athletic travel.
At the discretion of the instructors, a deadline may be adjusted for a student if there are special circumstances communicated to the instructors well in advance. 11th-hour requests for an extension to an assignment are unlikely to be granted absent truly exceptional circumstances.
Students who miss multiple classes due to prolonged illness should seek medical care and provide documentation of such to the Dean’s Office, which will communicate with the student’s professors. A prolonged absence may necessitate the student’s withdrawal from the course or from the University for the semester.
More information and resources:
- http://bulletin.georgetown.edu/regulation/standards
- http://academicsupport.georgetown.edu/
- http://studenthealth.georgetown.edu/mental-health
Academic Integrity
In this course, you will be asked to participate at times as an individual and at times working in a group. Exams should be completed entirely on your own. Exam questions should not be posted online or disclosed to other students who have not yet taken the exam (and the same goes for official answers to homework assignments).
For homework assignments, you are expected to write code/text and perform analyses yourself unless directed otherwise. I.e., don't copy solutions from other students or share yours with them. But you are encouraged to discuss concepts and implementation stumbling blocks with fellow students, within reason. The online discussion forum and office hours are good opportunities for this.
Part of treating others with respect is giving appropriate credit for ideas and scholarly works (including code). If you consult with other students on an assignment, report this in the work that you turn in. If in your code you use a library or implementation from another source, indicate that as well (minimally by including a URL in a comment). Instructors reserve the right to request an oral explanation of answers.
Course projects are intended to be highly collaborative, and the final project writeup should include a synopsis of who has contributed what. Version control software should be used in development for the final project. (Version control can also be used for homework assignments, provided that you ensure that other students do not have access to your solutions.)
In research writing, it is important to give credit to other research that provides specific foundations to your work, as well as to published work that is closely related. If you discuss ideas/information from a publication, be sure to cite it; if you reuse the specific phrasing of other work, use quotation marks. Knowing when and how to give credit can be tricky at times, so when in doubt, ask!
For more information:
Notice Regarding Sexual Misconduct
Please know that as a faculty member I am committed to supporting survivors of sexual misconduct, including relationship violence, sexual harassment and sexual assault. University policy also requires me to report any disclosures about sexual misconduct to the Title IX Coordinator, whose role is to coordinate the University’s response to sexual misconduct.
Georgetown has a number of fully confidential professional resources who can provide support and assistance to survivors of sexual assault and other forms of sexual misconduct. These resources include:
Jen Schweer, MA, LPC
Associate Director of Health Education Services for Sexual Assault Response and Prevention
(202) 687-0323
jls242@georgetown.edu
Erica Shirley, Trauma Specialist
Counseling and Psychiatric Services (CAPS)
(202) 687-6985
els54@georgetown.edu
More information about campus resources and reporting sexual misconduct can be found at http://sexualassault.georgetown.edu.