Human language technologies increasingly help us to communicate with computers and with each other. But every human language is extraordinarily complex, and the diversity seen in languages of the world is massive. Natural language processing (NLP) seeks to formalize and unpack different aspects of a language so computers can approximate human-like language abilities. In this course, we will examine the building blocks that underlie a human language such as English (or Japanese, Arabic, Tamil, or Navajo), and fundamental algorithms for analyzing those building blocks in text data, with an emphasis on the structure and meaning of words and sentences. Students will implement a variety of core algorithms for both rule-based and machine learning methods, and learn how to use computational linguistic datasets such as lexicons and treebanks. Text processing applications such as machine translation, information retrieval, and dialogue systems will be introduced as well.

This course is designed for undergraduates who are comfortable with the basics of discrete probability and possess solid programming skills, including the ability to use basic data structures and familiarity with regular expressions. COSC-160: Data Structures is the prerequisite for CS students, and LING-001 is the prerequisite for Linguistics students. Students that are new to programming or need a refresher are directed to LING-362: Introduction to NLP. The languages of instruction will be English and Python.